if-throw precondition check effectiveness and the DRY principle - c++

A lot of internet resources insist on checking preconditions in API functions via if (something_is_wrong) throw Exception{} instead of assert(!something_is_wrong) and I see some good points in it. However, I'm afraid such usage may result in doubling the same checks:
void foo(int positive) {
if (positive < 1) {
throw std::invalid_argument("param1 must be positive");
}
}
void caller() {
int number = get_number_somehow();
if (number >= 1) {
foo(number);
}
}
will probably execute like
int number = get_number_somehow();
if (number >= 1) {
if (number < 1) {
throw std::invalid_argument("param must be positive");
}
}
unless the call will actually be inlined and optimized by cutting out one of the ifs, I guess. Besides, writing the check twice (in foo() and in caller()) might be violating the DRY rule. Therefore, maybe I should go for
void caller() {
int number = get_number_somehow();
try {
foo(number);
} catch (std::invalid_argument const&) {
// handle or whatever
}
}
to evade repeating myself with those precondition checks, providing a bit of performance and a lot of maintainability in case of a function contract change.
However, I can't always apply such logic. Imagine std::vector having only at() but not operator[]:
for (int i = 0; i < vector.size(); ++i) {
bar(vector.at(i)); // each index is checked twice: by the loop and by at()
}
This code results in extra O(N) checks! Isn't it too much? Even if it is optimized out the same way as above, what about such situations with indirect calls or long functions, which probably won't be inlined?
So, should my program be written according to the rules below?
if an API function probably won't be inlined or is expected to be called a lot of times with checks on the call site (see the vector example), assert() its preconditions (inside it);
try-catch throwing functions instead of checking their preconditions before the call (the latter seems to break DRY).
If not, why?

So, there are two separate things you are talking about: DRY and performance.
DRY is about code maintenance and structure, and doesn't really apply to code you can't control. So, if the API is a black-box, and there happens to be code inside of it that you can't change, but you need to have separately, then I wouldn't think of it as not DRY to repeat it in your code. Y is Yourself.
But, you could still care about performance. If you measure a performance problem then fix it with whatever makes sense -- even if it's anti-DRY (or it's ok if it is).
But, if you control both sides (the API and the client) and you really want a pure, no-repeat, performant solution, then there's a pattern something like this pseudocode. I don't know the name, but I think of it as "Proof Providing"
let fn = precondition_check(myNum)
if fn != nil {
// the existence of fn is proof that myNum meets preconditions
fn()
}
The API func precondition_check returns a function that captures the myNum in it and doesn't need to check if it meets preconditions because it was only created if it did.

Related

Does this function have explicit return values on all control paths?

I have a Heaviside step function centered on unity for any data type, which I've encoded using:
template <typename T>
int h1(const T& t){
if (t < 1){
return 0;
} else if (t >= 1){
return 1;
}
}
In code review, my reviewer told me that there is not an explicit return on all control paths. And the compiler does not warn me either. But I don't agree; the conditions are mutually exclusive. How do I deal with this?
It depends on how the template is used. For an int, you're fine.
But, if t is an IEEE754 floating point double type with a value set to NaN, neither t < 1 nor t >= 1 are true and so program control reaches the end of the if block! This causes the function to return without an explicit value; the behaviour of which is undefined.
(In a more general case, where T overloads the < and >= operators in such a way as to not cover all possibilities, program control will reach the end of the if block with no explicit return.)
The moral of the story here is to decide on which branch should be the default, and make that one the else case.
Just because code is correct, that doesn't mean it can't be better. Correct execution is the first step in quality, not the last.
if (t < 1) {
return 0;
} else if (t >= 1){
return 1;
}
The above is "correct" for any datatype of t than has sane behavior for < and >=. But this:
if (t < 1) {
return 0;
}
return 1;
Is easier to see by inspection that every case is covered, and avoids the second unneeded comparison altogether (that some compilers might not have optimized out). Code is not only read by compilers, but by humans, including you 10 years from now. Give the humans a break and write more simply for their understanding as well.
As noted, some special numbers can be both < and >=, so your reviewer is simply right.
The question is: what made you want to code it like this in the first place? Why do you even consider making life so hard for yourself and others (the people that need to maintain your code)? Just the fact that you are smart enough to deduce that < and >= should cover all cases doesn't mean that you have to make the code more complex than necessary. What goes for physics goes for code too: make things as simple as possible, but not simpler (I believe Einstein said this).
Think about it. What are you trying to achieve? Must be something like this: 'Return 0 if the input is less than 1, return 1 otherwise.' What you've done is add intelligence by saying ... oh but that means that I return 1 if t is greater or equal 1. This sort of needless 'x implies y' is requiring extra think work on behalf of the maintainer. If you think that is a good thing, I would advise to do a couple of years of code maintenance yourself.
If it were my review, I'd make another remark. If you use an 'if' statement, then you can basically do anything you want in all branches. But in this case, you do not do 'anything'. All you want to do is return 0 or 1 depending on whether t<1 or not. In those cases, I think the '?:' statement is much better and more readable than the if statement. Thus:
return t<1 ? 0 : 1;
I know the ?: operator is forbidden in some companies, and I find that a horrible thing to do. ?: usually matches much better with specifications, and it can make code so much easier to read (if used with care) ...

Using elses in boolean functions c++

Let's say I have a simple function that checks a condition and returns true if the condition is true and false if the condition is false.
Is it better to use this type of code:
bool myfunction( /*parameters*/ ) {
if ( /*conditional statement*/ ) {
return true;
}
return false;
}
Or this type:
bool myfunction( /*parameters*/ ) {
if ( /*conditional statement*/ ) {
return true;
}
else return false;
}
Or does it just really not make a difference? Also, what considerations should I bear in mind when deciding whether to "if...else if" vs. "if...else" vs. "switch"?
You can also write this without any conditional at all:
bool myfunction( /*parameters*/ ) {
return /*conditional statement*/;
}
This way you avoid the conditional entirely.
Of course, if you are dealing with a different function where you need the conditional, it shouldn't make a difference. Modern compilers work well either way.
As far as using switch vs if-else, switch adds efficiency when you have many cases by allowing you to jump to a single one, making execution faster by not running all cases. At a low (hardware/compiler level), switch statements allow you to make a single check/jump, where if you had many if statements, you would need to make many checks/jumps.
It is the same. Remember whenever you say
return boolean;
the function ends and return to its calling line.
Therefore putting it inside else or just simply putting it is same.
say we want to check the prime
bool isPrime (int n){
for (int i = 2; i <= sqrt(n); i++){
if (n % i == 0)
return false;
}
return true;
}
if you see the function closely you will know if the number is divided properly with any value in range of sqrt(n) it will return false as the number is not a prime..
if it cannot be divided then the loop will end without any interference and said the number to be a prime. hence forth the function works properly.
Since neither of two given answers are hitting the nail, i will give you another one.
From the code (or compiler's) view, assuming recent compiler both versions are identical. Compiler will optimise if version to return version just fine. Difference is in debugging - the debugger you're using might not allow you to set breakpoint on return value (for example if you want to set breakpoint on only returning true values). While if version give you two return statements on different lines and any sane debugger will set breakpoint on line just fine.
Both functions are identical, regardless of any optimizations applied by the compiler, because the "else" in the second function hasn't any effect. If you leave the function as soon as the condition is met, you'll never enter the other branch in this case, so the "else" is implicit in the first version.
Hence I'd prefer the first version, because the "else" in the other one is misleading.
However, I agree with others here that this kind of function (both variants) doesn't make sense anyway, because you can simply use the plain boolean condition instead of this function, which is just a needless wrapper.
In terms of compilation the specific form you choose for if-else syntax won't make a big difference. The optimization path will usually erase any differences. Your decision should be made based on visual form instead.
As others have pointed out already, if you have a simple condition like this it's best to just return the calculation directly and avoid the if statement.
Returning directly only works if you have a boolean calculation. You might instead need to return a different type:
int foo(/*args*/) {
if(/*condition*/) {
return bar();
}
return 0;
}
Alternately you could use the ternary operator ?:, but if the expressions, it may not be as clear.
By using short returns (evaluation doesn't reach the end of the function) you can also sequence several conditions and evaluations.
int foo(/*args*/) {
if(/*condition1*/) {
return 0;
}
if(/*condition2*/) {
return 3;
}
int common = bar(/*args*/);
if(/*condition3*/) {
return 1-common;
}
return common;
}
Pick the form based on what makes the most logical sense, just ignore how this might compile. Then consider massaging the form to have the least visual complexity (avoids too much indentation or deep branching).

Is using std::out_of_range for logic bad?

In my project, I have a lot of situations like this:
constexpr size_t element_count = 42;
std::array<bool, element_count> elements;
for(size_t i = 0; i < element_count; ++i){
if(i > 0 && elements[i - 1]){/*do something*/}
else{/*do something else*/}
if(i < element_count - 1 && elements[i + 1]){/*do something*/}
else{/*do something else*/}
}
Without checking if i > 0 or i < element_count, I'll get undefined behavior. If I use std::array::at instead of operator[], I can get std::out_of_range exceptions instead. I was wondering if there were any problems with just relying on the exception like this:
for(size_t i = 0; i < element_count; ++i){
try{
if(elements.at(i - 1)){/*do something*/}
}
catch(const std::out_of_range& e){/*do something else*/}
try{
if(elements.at(i + 1)){/*do something*/}
}
catch(const std::out_of_range& e){/*do something else*/}
}
In this example it's more code, but in my real project it would reduce the amount of code because I'm using lots of multidimensional arrays and doing bounds checking for multiple dimensions.
There isn't a problem in the sense that it will work, but that's about it. Using exceptions for basic flow control (which is what you seem to be doing here) is usually frowned upon, with reason, and I don't think I've ever seen it like this in a loop:
makes reading and reasoning about code harder, also because it's unexpected one uses exceptions for flow control (instead of for error handling, which is what it is meant for in C++)
harder to read usually also means harder to write, and makes it harder to spot mistakes
you actually made a mistake already, or at least introduced a behaviour change: i > 0 && elements[i - 1] evaluating to false does not result in 'something else' being called anymore
reduction of the amount of code isn't a good goal anymore if it results in less readable or worse code
might be less performant
Now it would be interesting to see the actual code, but I suspect it could probably do without any bounds checking whatsoever, e.g. by making the loop start at 1 instead of 0 . Or, if this is a recurrnig pattern, you'd write a helper function (or use an existing on) for iteration with access to multiple elements in one iteration. That would be a reduction in code amount which is actually worth it.

Helper functions: lambdas vs normal functions

I have a function which internally uses some helper functions to keep its body organized and clean. They're very simple (but not always short) (they're more than just 2), and could be easily inlined inside the function's body, but I don't want to do so myself because, as I said, I want to keep that function's body organized.
All those functions need to be passed some arguments by reference and modify them, and I can write them in two ways (just a silly example):
With normal functions:
void helperf1(int &count, int &count2) {
count += 1;
count2 += 2;
}
int helperf2 (int &count, int &count2) {
return (count++) * (count2--);
}
//actual, important function
void myfunc(...) {
int count = count2 = 0;
while (...) {
helperf1(count, count2);
printf("%d\n", helperf2(count, count2));
}
}
Or with lambda functions that capture those arguments I explicitly pass in the example above:
void myfunc(...) {
int count = count2 = 0;
auto helperf1 = [&count, &count2] () -> void {
count += 1;
count2 += 2;
};
auto helperf2 = [&count, &count2] () -> int {
return (count++) * (count2--);
};
while (...) {
helperf1();
printf("%d\n", helperf2());
}
}
However, I am not sure on what method I should use. With the first, one, there is the "overhead" of passing the arguments (I think), while with the second those arguments could be (are them?) already included in there so that that "overhead" is removed. But they're still lambda functions which should (I think, again) not be as fast as normal functions.
So what should I do? Use the first method? Use the second one? Or sacrifice readability and just inline them in the main function's body?
Your first and foremost concern should be readability (and maintainability)!
Which of regular or lambda functions is more readable strongly depends on the given problem (and a bit on the taste of the reader/maintainer).
Don't be concerned about performance until you find that performance actually is an issue! If performance is an issue, start by benchmarking, not by guessing which implementation you think is faster (in many situations compilers are pretty good at optimizing).
Performance wise, there is no real issue here. Nothing to decide, choose whatever.
But, Lambda expressions won't do you any good for the purpose you want them.
They won't make the code any cleaner.
As a matter of fact I believe they will make the code a bit harder to read compared to a nice calculator object having these helper functions as member functions properly named with clean semantics and interface.
Using Lambda is more readable but they are actually there for more serious reasons , Lambda expressions are also known as "anonymous functions", and are very useful in certain programming paradigms, particularly functional programming, which lambda calculus ( http://en.wikipedia.org/wiki/Lambda_calculus )
Here you can find the goals of using lambdas :
https://dzone.com/articles/why-we-need-lambda-expressions
If you won't need the two helper functions somewhere else in your code, then use your lambda method , but if you will call one of them again somewhere in your project avoid writing them each time as lambdas , you can make a header file called "helpers.(h/hpp)" & a source file called "helper.(c/cpp)" then append all the helper functions there then you gain the readability of both the helper file and the caller file
You can avoid this unskilled habit and challange yourself by writing complex code that you have you read it more than once each time you want to edit it , that increases your programming skills and if you are working in a team , it won't be a problem , use comments , that will let them show more respect to your programming skills (if your complex code is doing the expected behaviour and giving the expected output)
And don't be concerned about performance until you find yourself writing a performance critical algorithm , if not , the difference will be in few milliseconds and the user won't notice it , so you will be loosing you time in an optimization that compiler can do by itself most of the time if you ask him to optimize your code .

Techniques to avoid minimal scope inefficiency with complex objects in loops in C++?

Question First
Is there an elegant solution in C++ to prevent one from having to declare complex object variables that are only used within a loop outside of the loop for efficiency reasons?
Detailed explanation
A colleague has raised an interesting point wrt. to our code policy, which states (paraphrased): always use minimal scope for variables and declare the variable at the first initialization.
Coding Guide Example:
// [A] DO THIS
void f() {
...
for (int i=0; i!=n; ++i) {
const double x = calculate_x(i);
set_squares(i, x*x);
}
...
}
// [B] DON'T do this:
void f() {
int i;
int n;
double x;
...
for (i=0; i!=n; ++i) {
x = calculate_x(i);
set_squares(i, x*x);
}
...
}
This is all nice and well, and there's certainly nothing wrong with this, until you move from primitive types to objects. (for a certain kind of interface)
Example:
// [C]
void fs() {
...
for (int i=0; i!=n; ++i) {
string s;
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
...
}
Here, the string s will be destructed, it's memory release every loop cycle and then every cycle the get_text function will have to newly allocate the memory for the s buffer.
It would be clearly more efficient to write:
// [D]
string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
as now the allocated memory in the s buffer will be preserved between loop runs and it is very likely that we'll save on allocations.
Disclaimer: Please note: Since this is loops and we're talking memory allocations, I do not consider it premature optimization to think about this problem generally. Certainly there are cases and loops where the overhead wouldn't matter; but n has the nagging tendency to be larger that the Dev initially expects and the code has the nagging tendency to be run in contexts where performance does matter.
Anyway, so now the more efficient way for the "general" loop construct is to violate code locality and declare complex objects out of place, "just in case". This makes me rather uneasy.
Note that I consider writing it like this:
// [E]
void fs() {
...
{
string s;
for (int i=0; i!=n; ++i) {
get_text(i, s); // void get_text(int, string&);
to_lower(s);
set_lower_text(i, s);
}
}
...
}
is no solution as readability suffers even more!
Thinking further, the interface of the get_text function is non-idiomatic anyway, as out params are so yesterday anyway and a "good" interface would return by value:
// [F]
for (int i=0; i!=n; ++i) {
string s = get_text(i); // string get_text(int);
to_lower(s);
set_lower_text(i, s);
}
Here, we do not pay double for memory allocation, because it is extremely likely that s will be constructed via RVO from the return value, so for [F] we pay the same in allocation overhead as in [C]. Unlike the [C] case however, we can't optimize this interface variant.
So the bottom line seems to be that using minimal scope (can) hurt performance and using clean interfaces I at least consider return by value a lot cleaner than that out-ref-param stuff will prevent optimization opportunities -- at least in the general case.
The problem isn't so much that one would have to forgo clean code for efficiency sometimes, the problem is that as soon as Devs start to find such special cases, the whole Coding Guide (see [A], [B]) looses authority.
The question now would be: see first paragraph
It would be clearly more efficient to write: [start of example D ...]
I doubt this bit. You're paying for default construction to begin with outside the loop. Within the loop, there is a possibility that get_text calls reallocate buffer (depends on how your get_text and the string is defined). Note that this for some runs you may actually see an improvement (say in the case where you get progressively shorter strings) and for some (where the string lengths go up by about a factor of 2 at every iteration) a huge hit in performance.
It makes perfect sense to hoist invariants out of your loop should they pose a bottleneck (which a profiler will tell you). Otherwise, go for code that is idiomatic.
I'd either:
make an exception to the rule for these heavyweights. like 'D' and note that you can restrict the scope as desired.
permit a helper function (the string could also be a parameter)
and if you really didn't like those, you could declare a local in your for loop's scope using a multi-element object which held your counter/iterator and the temporary. std::pair<int,std::string> would be one option, although a specialized container could reduce the syntactic noise.
(and the out parameter would be faster than RVO-style in many cases)
Depends on the implementation of get_text.
If you can implement it so it reuses the space allocated in the string object most of the time, then definitely declare the object outside the loop to avoid new dynamic memory allocation at each loop iteration.
Dynamic allocation is expensive (best single-threaded allocators will need about 40 instructions for a single allocation, multi-threading adds overhead and not all allocators are "best"), and can fragment memory.
(BTW, std::string typically implements so called "small string optimization", which avoids dynamic allocation for small strings. So if you know most of your strings will be small enough, and the implementation of std::string won't change, you could theoretically avoid dynamic allocation even when constructing a new object in each iteration. This would be very fragile however, so I'd recommend against it.)
In general case, it all depends on how your objects and functions that use them are implemented. If you care about performance, you'll have to deal with these kinds of "abstraction leaks" on case-by-case basis. So, pick your battles wisely: measure and optimize bottlenecks first.
If you have a copy-on-write implementation of the string class, then to_lower(s) will allocate memory anyway, so it is not clear that you can gain performance by simply declaring s outside the loop.
In my opinion, there are two possibilities:
1.) You have a class whose constructor does something non-trivial which need not be re-done in each iteration. Then it is logically straightforward to put the declaration outside the loop.
2.) You have a class whose constructor does not do anything useful, then put the declaration inside the loop.
If 1. is true, then you should probably split your object into a helper object which, e.g., allocates space and does non-trivial initializations, and a flyweight object. Something like the following:
StringReservedMemory m (500); /* base object for something complex, allocating 500 bytes of space */
for (...) {
MyOptimizedStringImplementation s (m);
...
}