In Calcite, after optimization provided by the default VolcanoPlanner, we can get an optimized RelNode, but can we have a further optimization? For example I want to add an ElasticsearchSort or something like that to limit the dataset we handle.
Someone suggests we can define a RelOptRule, but since VolcanoPlanner handles the optimization in dynamic programming way, not sure if the rule can be applied in the right order. Any ideas?
You don't have to use VolcanoPlanner. There's also HepPlanner which just applies the rules that you give it as a HepProgram.
Related
If I have a member function declared like so:
double* restrict data(){
return m_data; // array member variable
}
can the restrict keyword do anything?
Apparently, with g++ (x86 architecture) it cannot, but are there other compilers/architectures where this type of construction makes sense, and would allow for optimized machine code generation?
I'm asking because the Blitz library (Blitz++) has a whole slew of functions declared in this manner, and it doesn't make sense that someone would go in and add the restrict keyword unless it actually does something. So before I go in and remove the restrict's (to get rid of compiler warnings) I'd like to know how I'm abusing the code.
WHAT restrict ARE WE TALKING ABOUT?
restrict is, as it currently stands, non-standard.. which means that it's a compiler extension; it's non-portable in the sense that the C++ Standard doesn't mandate its existance, nor is there any formal text in it that tells us what it is supposed to do.
restrict is currently compiler specific in C++, and one has to resort to the compiler documentation of their choice to see exactly what it is doing.
SOME THOUGHTS
There are many papers about the usage of restrict, among them:
Restricted Pointers - Using the GNU Compiler Collection
restrict - wikipedia.org
Demystifying The Restrict Keyword - CellPerformance
It's hinted at several places that the purpose of restrict is to qualify pointers so that the compiler knows that two pointers in the same scope doesn't refer to the same memory location.
With this in mind we can easily see that the return-type has no potential collision with other pointers, so using it in such context will generally not gain any optimization opportunities. However; one must refer to the documented behaviour of the used implementation to know for sure.. as stated: restrict is not standard, yet.
I also found the following thread where the developers of Blitz++ discusses the removal of strict applied to the return-type of a function, since it doesn't do anything:
Re: [Blitz-devel] type qualifiers ignored on function return type
A LITTLE NOTE
As a further note, here's what the LLVM Documentation says about noalias vs restrict:
For function return values, C99’s restrict is not meaningful, while LLVM’s noalias is.
Generaly restrict qualifier can only help to better optimize code. By removing 'restrict' you don't break anything, but when you add it without care you can get some errors. A great example is the difference between memcpy and memmove. You can always use slower memmove, but you can use faster memcpy only if you know that src and dst aren't overlaping.
Can I expect to see a performance hit for the casting in this . .
enum class myEnum {A,B,C};
myArray[(int)myEnum::A] = 123;
Compared to this?
enum myEnum {A,B,C};
myArray[A] = 123;
I'm leaning towards the new style enum classes for the type safety, but don't want to do it at the expense of performance.
It depends whether the enum value used as an index is known at compile time or passed in a variable.
That is myArray[(int)myEnum::A] shall not incur any penalty but myArray[(int)e] might, depending on the physical representation of e (ie, it might be necessary to "extend" it).
On the other hand, a simple extension is a trivial operation that is unlikely to ever show up as a performance issues: things like branch prediction (in conditionals) and caching are much more important in most applications (for low-level), and at a higher level algorithms matter.
Note: to avoid the extension issue in the runtime scenario, you could define the base type of myEnum to be the natural type that is expected for compiler arithmetic, I believe a ptrdiff_t would be most appropriate here. It is a big integer though.
Of course it's ultimately up to the compiler, but it's hard to see why any reasonable compiler would generate different code in these two cases.
I've tested this with g++ 4.7.2 on Intel, and they compile to identical assembly code.
No, it's not likely that the casting should have any impact on performance. This is all resolved at compile time.
I would expect NOT, but it would depend on the compiler implementation. Why don't you try both approaches, and time it? Try a few different compiler optimisation settings too, so you know that it won't make it different when it comes to production code compiles.
I doubt there would be any assembly instructions at all for the cast, since the entire expression (int)myEnum::A can be evaluated at compile time.
But if you really want to know, make a pair of sample programs, and analyze both by dumping disassembly, and/or measuring performance.
In my code I've been rewriting static_cast<int *> about a million times, is there a way to redefine a keyword so that whenever I call this it does the same thing?
example
cast would do the same thing as static_cast<int *>
static_cast has the benefit that C++ programmers will recognize exactly what it is without needing to go find your #define or other statement. I would highly recommend you continue to use static_cast.
However, my assumption is that your problem is the number of keystrokes required, and so the best solution would be to use a text editor which supports macros. This way, the code that ends up saved does use the standard static_cast<T>(x) syntax, but you may only need to type something such as [sc]tabTtabxtab.
Information on how to do such will be found in documentation of such editors. I'm not a big fan of highly-customizable editors, so specifics are beyond my knowledge.
Asking for easier way to do something dangerous…
Yes, there are a lot of ways to accomplish what you ask for, including
C++ template,
macro definition,
editor shortcut,
custom preprocessing,
trained monkey that fixes up the code.
But all you accomplish is to make your code even less grokable.
Instead, try to figure out how come you so often lose type information so that you have to put it back by hand, so to speak.
The general solution is, very simply, to not throw away type information in the first place.
You mean like this?
#define SCAST(T,X) static_cast<T>(X);
I should warn you though that generally the overuse of defines like this can make your code obscure and harder to comprehend.
More importantly you have to watch out with macros as they can cause hard to find bugs for example:
#define SQUARE(X) = X*X;
Well if you call this with x++, the pre-processor will do a literal substitution and you'll end up with (x++)*(x++); which means it totally won't be the answer you're looking for and to make things worst because the substitution happens behind the scenes you'll have a hard time finding the cause.
I would suggest you instead look into template functions or just inline helper functions when you can, it's safe and will avoid the problem I pointed out.
I'm interested in writing good code from the beginning instead of optimizing the code later. Sorry for not providing benchmark I don't have a working scenario at the moment. Thanks for your attention!
What are the performance gains of using FunctionY over FunctionX?
There is a lot of discussion on stackoverflow about this already but I'm in doubts in the case when accessing sub-members (recursive) as shown below. Will the compiler (say VS2008) optimize FunctionX into something like FunctionY?
void FunctionX(Obj * pObj)
{
pObj->MemberQ->MemberW->MemberA.function1();
pObj->MemberQ->MemberW->MemberA.function2();
pObj->MemberQ->MemberW->MemberB.function1();
pObj->MemberQ->MemberW->MemberB.function2();
..
pObj->MemberQ->MemberW->MemberZ.function1();
pObj->MemberQ->MemberW->MemberZ.function2();
}
void FunctionY(Obj * pObj)
{
W * localPtr = pObj->MemberQ->MemberW;
localPtr->MemberA.function1();
localPtr->MemberA.function2();
localPtr->MemberB.function1();
localPtr->MemberB.function2();
...
localPtr->MemberZ.function1();
localPtr->MemberZ.function2();
}
In case none of the member pointers are volatile or pointers to volatile and you don't have the operator -> overloaded for any members in a chain both functions are the same.
The optimization rule you suggested is widely known as Common Expression Elimination and is supported by vast majority of compilers for many decades.
In theory, you save on the extra pointer dereferences, HOWEVER, in the real world, the compiler will probably optimize it out for you, so it's a useless optimization.
This is why it's important to profile first, and then optimize later. The compiler is doing everything it can to help you, you might as well make sure you're not just doing something it's already doing.
if the compiler is good enough, it should translate functionX into something similar to functionY.
But you can have different result on different compiler and on the same compiler with different optimization flag.
Using a "dumb" compiler functionY should be faster, and IMHO it is more readable and faster to code. So stick with functionY
ps. you should take a look at some code style guide, normally member and function name should always start with a low-case letter
I'm working on some existing c++ code that appears to be written poorly, and is very frequently called. I'm wondering if I should spend time changing it, or if the compiler is already optimizing the problem away.
I'm using Visual Studio 2008.
Here is an example:
void someDrawingFunction(....)
{
GetContext().DrawSomething(...);
GetContext().DrawSomething(...);
GetContext().DrawSomething(...);
.
.
.
}
Here is how I would do it:
void someDrawingFunction(....)
{
MyContext &c = GetContext();
c.DrawSomething(...);
c.DrawSomething(...);
c.DrawSomething(...);
.
.
.
}
Don't guess at where your program is spending time. Profile first to find your bottlenecks, then optimize those.
As for GetContext(), that depends on how complex it is. If it's just returning a class member variable, then chances are that the compiler will inline it. If GetContext() has to perform a more complicated operation (such as looking up the context in a table), the compiler probably isn't inlining it, and you may wish to only call it once, as in your second snippet.
If you're using GCC, you can also tag the GetContext() function with the pure attribute. This will allow it to perform more optimizations, such as common subexpression elimination.
If you're sure it's a performance problem, change it. If GetContext is a function call (as opposed to a macro or an inline function), then the compiler is going to HAVE to call it every time, because the compiler can't necessarily see what it's doing, and thus, the compiler probably won't know that it can eliminate the call.
Of course, you'll need to make sure that GetContext ALWAYS returns the same thing, and that this 'optimization' is safe.
If it is logically correct to do it the second way, i.e. calling GetContext() once on multiple times does not affect your program logic, i'd do it the second way even if you profile it and prove that there are no performance difference either way, so the next developer looking at this code will not ask the same question again.
Obviously, if GetContext() has side effects (I/O, updating globals, etc.) than the suggested optimization will produce different results.
So unless the compiler can somehow detect that GetContext() is pure, you should optimize it yourself.
If you're wondering what the compiler does, look at the assembly code.
That is such a simple change, I would do it.
It is quicker to fix it than to debate it.
But do you actually have a problem?
Just because it's called often doesn't mean it's called TOO often.
If it seems qualitatively piggy, sample it to see what it's spending time at.
Chances are excellent that it is not what you would have guessed.