How to get Instruction in MachineInstr? - llvm

I wanted to know variable dependence in a real register (like X86:EAX, EBX ...). So, I have created an IR-PASS that can identify dependencies on the IR. This pass uses the newly added variables unsigned HasDependency: 1; and unsigned HasMaybeDependency: 1; in the Value class.
.
.
// Use the same type as the bitfield above so that MSVC will pack them.
unsigned IsUsedByMD : 1;
unsigned HasName : 1;
unsigned HasHungOffUses : 1;
unsigned HasDescriptor : 1;
unsigned HasDependency : 1;
unsigned HasMaybeDependency : 1;
.
.
.
void setDependency() { HasDependency = true; }
void setMaybeDependency() { HasMaybeDependency = true; }
bool hasDependency() const { return HasDependency; }
bool hasMaybeDependency() const { return HasMaybeDependency; }
//static_assert(sizeof(Value) == 2 * sizeof(void *) + 2 * sizeof(unsigned),
// "Value too big");
When applied to a code snippet like this:
extern int foo_called(int a);
int foo(int k)
{
int __attribute__((annotate("xxx"))) a;
for (int i = 0; i < k; i++)
{
int c = a + k;
a += foo_called(c);
}
return 0;
}
which produces this bitcode:
define i32 #"\01?foo##YAHH#Z"(i32 %k) local_unnamed_addr #0 {
entry:
%a = alloca i32, align 4
%0 = bitcast i32* %a to i8*
call void #llvm.lifetime.start.p0i8(i64 4, i8* nonnull %0) #2
call void #llvm.var.annotation(i8* nonnull %0, i8* getelementptr inbounds ([4 x i8], [4 x i8]* #.str, i32 0, i32 0), i8* getelementptr inbounds ([6 x i8], [6 x i8]* #.str.1, i32 0, i32 0), i32 17)
%cmp7 = icmp sgt i32 %k, 0
br i1 %cmp7, label %for.body.lr.ph, label %for.cond.cleanup
for.body.lr.ph: ; preds = %entry
%.pre = load i32, i32* %a, align 4, !tbaa !3
br label %for.body
for.cond.cleanup: ; preds = %for.body, %entry
call void #llvm.lifetime.end.p0i8(i64 4, i8* nonnull %0) #2
ret i32 0
for.body: ; preds = %for.body, %for.body.lr.ph
%1 = phi i32 [ %.pre, %for.body.lr.ph ], [ %add2, %for.body ]
%i.08 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
%add = add nsw i32 %1, %k
%call = call i32 #"\01?foo_called##YAHH#Z"(i32 %add)
%2 = load i32, i32* %a, align 4, !tbaa !3
%add2 = add nsw i32 %2, %call
store i32 %add2, i32* %a, align 4, !tbaa !3
%inc = add nuw nsw i32 %i.08, 1
%exitcond = icmp eq i32 %inc, %k
br i1 %exitcond, label %for.cond.cleanup, label %for.body
}
declare i32 #"\01?foo_called##YAHH#Z"(i32) local_unnamed_addr #3
The result of the the pass on the above bitcode is:
Function - ?foo##YAHH#Z
Annotated Variable List :
- Annotated : a(message: xxx)
Annotated-Variable : a
(Perpect) %add2 = add nsw i32 %2, %call
(Perpect) %2 = load i32, i32* %a, align 4, !tbaa !3
(Perpect) %a = alloca i32, align 4
(Perpect) %cmp7 = icmp sgt i32 %k, 0
(Maybe) %exitcond = icmp eq i32 %inc, %k
(Maybe) %inc = add nuw nsw i32 %i.08, 1
(Maybe) %i.08 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
(Perpect) %call = call i32 #"\01?foo_called##YAHH#Z"(i32 %add)
(Perpect) %add = add nsw i32 %1, %k
(Perpect) %1 = phi i32 [ %.pre, %for.body.lr.ph ], [ %add2, %for.body ]
(Perpect) %.pre = load i32, i32* %a, align 4, !tbaa !3
I followed the SelectionDAGISel.cpp: SelectAllBasicBlocks function to get information from the backend, but I was able to get only AllocaInst, StoreInst, and LoadInst using as follows:
for (MachineBasicBlock &MBB : mf) {
for (MachineInstr& I : MBB) {
for (MachineInstr::mmo_iterator i = I.memoperands_begin(),
e = I.memoperands_end();
i != e; ++i) {
if (const Value *V = (*i)->getValue())
errs() << *V << "\n";
}
}
}
How do I know the correlation between MachineInstr and Instruction? If it is not provided in LLVM, which parts need to be fixed?

This is not normal. This is an trick. But I am using this method very usefully. If you know the normal way, please give me a comment.
I solved this problem using DebugLoc. It is used to represent the line-column-row, function-name etc... information of .c, .cpp files. This information will remain from the time of ;;vm-ir until MachineInstr.
So, if it is guaranteed that DebugLoc is not used in your compiler processing, you can put the address of the class that contains the information needed for the row information. This will allow you to cast the DebugLoc row to the desired class at the right time. (You can use column, because column must less than 2^16.)
The following describes in detail the method I used.
Change file and Re-Build your project.
Several design patterns were used to maximize memory efficiency, so I could not easily change the class.
First, modify DebugLoc-print routine. GOTO DebugLoc.cpp and delete DIScope print routine like this. This processing save you form runtime-error.
void DebugLoc::print(raw_ostream &OS) const {
if (!Loc)
return;
// Print source line info.
//auto *Scope = cast<DIScope>(getScope());
//OS << Scope->getFilename();
OS << ':' << getLine();
if (getCol() != 0)
OS << ':' << getCol();
Second, The verifier should be modified. This syntax will be helpful.
void Verifier::visitDILocation(const DILocation &N) {
- AssertDI(N.getRawScope() && isa<DILocalScope>(N.getRawScope()),
- "location requires a valid scope", &N, N.getRawScope());
+ //AssertDI(N.getRawScope() && isa<DILocalScope>(N.getRawScope()),
+ // "location requires a valid scope", &N, N.getRawScope());
if (auto *IA = N.getRawInlinedAt())
AssertDI(isa<DILocation>(IA), "inlined-at should be a location", &N, IA);
}
Third, there are some formal steps to register a class in DebugLoc. Create initialize function for this.
static LLVMContext cnt;
static MDNode *md;
md = MDNode::get(cnt, DILocation::get(cnt, 100, 100, DIScope::get(cnt, nullptr)));
Last, create register function.
static DebugLoc getDebugLoc(DependencyInstrInfoManager *info)
{
return DebugLoc::get(reinterpret_cast<unsigned> (info), (uint16_t)-1, md);
}
static void setDebugLoc(Instruction *I, ...)
{
DependencyInstrInfoManager *mgr;
if (I->getDebugLoc()) {
mgr = reinterpret_cast<DependencyInstrInfoManager *>
(I->getDebugLoc()->getLine());
} else {
mgr = new DependencyInstrInfoManager();
I->setDebugLoc(getDebugLoc(mgr));
}
mgr->addInfo(new DependencyInstrInfo(I, S, T, ...));
}
DependencyInstrInfoManager is the class for answering the above questions.
Finally, you can print your own information in XXXMCInstLower.cpp:EmitInstruction();(like X86MCInstLower.cpp). The following statement is an example of the output of my case.
void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
X86MCInstLower MCInstLowering(*MF, *this);
const X86RegisterInfo *RI = MF->getSubtarget<X86Subtarget>().getRegisterInfo();
if (MI->getDebugLoc()) {
DependencyInstrInfoManager *mgr = reinterpret_cast<DependencyInstrInfoManager *>
(MI->getDebugLoc()->getLine());
mgr->doFolding();
for (auto DI : *mgr)
OutStreamer->AddComment(DI->getInfo());
}
Dependency Marking
I have done dependency marking using this method.
int foo(int k)
{
int ANNORATE("b") b = 0;
int ANNORATE("a") a = 0;
for (int i = 0; i < k; i++)
{
int c = a + k;
int d = b + k;
a += foo_called(c);
b += foo_called2(c);
}
return a + foo_called(b);
}
to
# BB#1: # %for.body.preheader
movl %esi, %ebx
.p2align 4, 0x90
LBB0_2 : # %for.body
# =>This Inner Loop Header: Depth=1
addl %esi, %edi # [Perpect, Source:b]
# [Perpect, Source: a]
pushl %edi # [Maybe, Source:b]
# [Perpect, Source: a]
calll "?foo_called##YAHH#Z" # [Maybe, Source:b]
# [Perpect, Source: a]
addl $4, %esp # [Maybe, Source:b]
# [Perpect, Source: a]
addl %eax, 4(%esp)
pushl %edi # [Perpect, Source:b]
calll "?foo_called2##YAHH#Z" # [Perpect, Source:b]
addl $4, %esp # [Perpect, Source:b]
addl(%esp), %eax # [Annotated, Source:b]
movl 4(%esp), %edi # [Perpect, Source:b]
# [Perpect, Source: a]
decl %ebx # [Maybe, Source:b]
movl %eax, (%esp)
jne LBB0_2
jmp LBB0_4

Related

LLVM LoopPass values used outside the loop

I'm writing an LLVM LoopPass in which I need to know which values
are used outside the loop. For that I have this code:
virtual bool runOnLoop(Loop *loop, LPPassManager &LPM)
{
for (auto it = loop->block_begin(); it != loop->block_end(); it++)
{
for (auto inst = (*it)->begin(); inst != (*it)->end(); inst++)
{
if (Is_Used_Outside_This_loop(loop,(Instruction *) inst))
{
errs() << inst->getName().str();
errs() << " is used outside the loop\n";
}
}
}
// ...
}
The inner function seemed right at first, but with the *.ll file below,
it gives incorrect classification for %tmp5 since it is used twice
inside a basic block of the loop.
bool Is_Used_Outside_This_loop(Loop *loop, Value *v)
{
int n=0;
int numUses = v->getNumUses();
for (auto it = loop->block_begin(); it != loop->block_end(); it++)
{
if (v->isUsedInBasicBlock(*it))
{
n++;
}
}
if (n == numUses) return false;
else return true;
}
The following *.ll code shows that %tmp5 is used twice
inside a basic block of the loop. When I carefully searched the API,
I couldn't find anything like Value::numUsesInBasicBlock( ... )
; Function Attrs: nounwind uwtable
define internal void #foo(i8* %s) #0 {
entry:
%s.addr = alloca i8*, align 8
%c = alloca i8, align 1
store i8* %s, i8** %s.addr, align 8
store i8 0, i8* %c, align 1
br label %while.cond
while.cond: ; preds = %while.body, %entry
%tmp = load i8*, i8** %s.addr, align 8
%tmp1 = load i8, i8* %tmp, align 1
%conv = sext i8 %tmp1 to i32
%cmp = icmp eq i32 %conv, 97
br i1 %cmp, label %lor.end, label %lor.rhs
lor.rhs: ; preds = %while.cond
%tmp2 = load i8*, i8** %s.addr, align 8
%tmp3 = load i8, i8* %tmp2, align 1
%conv2 = sext i8 %tmp3 to i32
%cmp3 = icmp eq i32 %conv2, 98
br label %lor.end
lor.end:; preds = %lor.rhs, %while.cond
%tmp4 = phi i1 [ true, %while.cond ], [ %cmp3, %lor.rhs ]
br i1 %tmp4, label %while.body, label %while.end
while.body: ; preds = %lor.end
%tmp5 = load i8*, i8** %s.addr, align 8
%incdec.ptr = getelementptr inbounds i8, i8* %tmp5, i32 1
store i8* %incdec.ptr, i8** %s.addr, align 8
%tmp6 = load i8, i8* %tmp5, align 1
store i8 %tmp6, i8* %c, align 1
br label %while.cond
while.end: ; preds = %lor.end
%tmp7 = load i8*, i8** %s.addr, align 8
%tmp8 = load i8, i8* %tmp7, align 1
%conv5 = sext i8 %tmp8 to i32
%cmp6 = icmp eq i32 %conv5, 99
br i1 %cmp6, label %if.then, label %if.end
if.then: ; preds = %while.end
%tmp9 = load i8*, i8** %s.addr, align 8
%incdec.ptr8 = getelementptr inbounds i8, i8* %tmp9, i32 1
store i8* %incdec.ptr8, i8** %s.addr, align 8
br label %if.end
if.end: ; preds = %if.then, %while.end
ret void
}
Clearly, there's got to be a way of doing this, right? Thanks!
The problem is your exit condition. You ask for uses, but then compare with the number of blocks that the value is used in.
So, if the value is used twice in the same basic block, the number of uses is 2 and the n counter of basic blocks that used the value is only incremented once, hence the mismatch of n and numUses.
Maybe a more concise way to do what you want is:
void FindUsesNotIn(
llvm::SmallPtrSetImpl<llvm::BasicBlock *> &Blocks,
llvm::SmallPtrSetImpl<llvm::Value *> &OutUses) {
for(const auto &b : Blocks)
for(auto &i : *b)
for(const auto &u : i.users()) {
auto *userInst = llvm::dyn_cast<llvm::Instruction>(u);
if(userInst && !Blocks.count(userInst->getParent())) {
OutUses.insert(&i);
break;
}
}
}
and then in the runOnLoop method have something like this:
virtual bool runOnLoop(llvm::Loop *loop, llvm::LPPassManager &LPM) {
llvm::SmallPtrSet<llvm::BasicBlock*, 10> loopBlocks(loop->block_begin(), loop->block_end());
llvm::SmallPtrSet<llvm::Value *, 10> outs;
FindUsesNotIn(loopBlocks, outs);
for(const auto *e : outs)
llvm::dbgs() << *e << '\n';
return false;
}
Here's how I solved it, though it looks overly complicated:
virtual bool runOnLoop(Loop *loop, LPPassManager &LPM)
{
for (auto it = loop->block_begin(); it != loop->block_end(); it++)
{
for (auto inst = (*it)->begin(); inst != (*it)->end(); inst++)
{
int n=0;
for (auto use = inst->use_begin(); use != inst->use_end(); use++)
{
Instruction *i = (Instruction *) use->getUser();
if (BasicBlockBelongsToLoop(i->getParent(),loop))
{
n++;
}
}
assert(n <= inst->getNumUses());
if (n < inst->getNumUses())
{
errs() << inst->getName().str();
errs() << " is used outside the loop\n";
}
}
}
// ...
I also couldn't found in the API, how to check if a basic block belongs to a loop,
so I had to write my own BasicBlockBelongsToLoop, here it is:
bool BasicBlockBelongsToLoop(BasicBlock *BB, Loop *loop)
{
for (auto it = loop->block_begin(); it != loop->block_end(); it++)
{
if (BB == (*it))
{
return true;
}
}
return false;
}

Delete complete branch from llvm ir

There is a branch in ir that I want to delete completely(condtion + branch + true_basic_block + false_basic_block). It looks like this:
%4 = icmp sge i32 %2, %3
br i1 %4, label %5, label %7
; <label>:5 ; preds = %0
%6 = load i32* %x, align 4
store i32 %6, i32* %z, align 4
br label %9
; <label>:7 ; preds = %0
%8 = load i32* %y, align 4
store i32 %8, i32* %z, align 4
br label %9
; <label>:9 ; preds = %7, %5
%10 = call dereferenceable(140) %"class.std::basic_ostream"*#_ZStlsISt11char_traitsIcEERSt13basic_ostreamIcT_ES5_PKc(%"class.std::basic_ostream"* dereferenceable(140) #_ZSt4cout, i8* getelementptr inbounds ([5 x i8]* #.str, i32 0, i32 0))
%11 = load i32* %z, align 4
%12 = call dereferenceable(140) %"class.std::basic_ostream"* #_ZNSolsEi(%"class.std::basic_ostream"* %10, i32 %11)
%13 = call dereferenceable(140) %"class.std::basic_ostream"* #_ZNSolsEPFRSoS_E(%"class.std::basic_ostream"* %12, %"class.std::basic_ostream"* (%"class.std::basic_ostream"*)* #_ZSt4endlIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_)
ret i32 0
Now to delete it , is there a removeBranch function , or do I need to delete instructions one by one. I have been trying the latter way but I have seen every error from "Basic block in main does not have an terminator" to "use remains when def is destroyed", and many more.. I have used erasefromparent, replaceinstwithvalue, replaceinstwithinst, removefromparent, etc.
Can anyone be kind enough to point me in the correct direction?
This is my function_pass :
bool runOnFunction(Function &F) override {
for (auto& B : F)
for (auto& I : B)
if(auto* brn = dyn_cast<BranchInst>(&I))
if(brn->isConditional()){
Instruction* cond = dyn_cast<Instruction>(brn->getCondition());
if(cond->getOpcode() == Instruction::ICmp){
branch_vector.push_back(brn);
//removeConditionalBranch(dyn_cast<BranchInst>(brn));
}
}
/*For now just delete the branches in the vector.*/
for(auto b : branch_vector)
removeConditionalBranch(dyn_cast<BranchInst>(b));
return true;
}
This is the output :
I don't know of any RemoveBranch utility function, but something like this should work. The idea is to delete the branch instruction, then delete anything that becomes dead as a result, and then merge the initial block with the join block.
// for DeleteDeadBlock, MergeBlockIntoPredecessor
#include "llvm/Transforms/Utils/BasicBlockUtils.h"
// for RecursivelyDeleteTriviallyDeadInstructions
#include "llvm/Transforms/Utils/Local.h"
void removeConditionalBranch(BranchInst *Branch) {
assert(Branch &&
Branch->isConditional() &&
Branch->getNumSuccessors() == 2);
BasicBlock *Parent = Branch->getParent();
BasicBlock *ThenBlock = Branch->getSuccessor(0);
BasicBlock *ElseBlock = Branch->getSuccessor(1);
BasicBlock *ThenSuccessor = ThenBlock->getUniqueSuccessor();
BasicBlock *ElseSuccessor = ElseBlock->getUniqueSuccessor();
assert(ThenSuccessor && ElseSuccessor && ThenSuccessor == ElseSuccessor);
Branch->eraseFromParent();
RecursivelyDeleteTriviallyDeadInstructions(Branch->getCondition());
DeleteDeadBlock(ThenBlock);
DeleteDeadBlock(ElseBlock);
IRBuilder<> Builder(Parent);
Builder.CreateBr(ThenSuccessor);
bool Merged = MergeBlockIntoPredecessor(ThenSuccessor);
assert(Merged);
}
This code only handles the simple case you've shown, with the then and else blocks both jumping unconditionally to a common join block (it will fail with an assertion error for anything more complicated). More complicated control flow will be a bit trickier to handle, but you should still be able to use this code as a starting point.

LLVM "Instruction does not dominate all uses" - Inserting new Instruction

I am getting the following error while inserting an instruction using an llvm pass:
Instruction does not dominate all uses!
%add = add nsw i32 10, 2
%cmp3 = icmp ne i32 %a.01, %add
Broken module found, compilation aborted!
I have the source code in a bitcode file whose snippet is:
if.then: ; preds = %entry
%add = add nsw i32 10, 2
br label %if.end
if.else: ; preds = %entry
%sub = sub nsw i32 10, 2
br label %if.end
if.end: ; preds = %if.else, %if.then
%a.0 = phi i32 [ %add, %if.then ], [ %sub, %if.else ]
%a.01 = call i32 #tauInt32Ty(i32 %a.0) ; line A
%add3 = add nsw i32 %a.01, 2
%add4 = add nsw i32 %a.01, 3
%call5 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([7 x i8]* #.str2, i32 0, i32 0), i32 %add3, i32 %add4)
I want to insert a new instruction after "line A" which is :
%cmp3 = icmp ne i32 %a.01, %add
And I have written a function pass whose snippet of the code which does this task is :
for (Function::iterator bb = F.begin(), e = F.end(); bb != e; ++bb) {
for (BasicBlock::iterator i = bb->begin(), e = bb->end(); i != e; ++i) {
std::string str;
if(isa<CallInst>(i))// || true) {
BasicBlock::iterator next_it = i;
next_it++;
Instruction* next = dyn_cast<Instruction>(&*next_it);
CallInst* ci = dyn_cast<CallInst>(&*i);
Function* ff = ci->getCalledFunction();
str = ff->getName();
errs()<<"> "<<str<<"\n";
if(!str.compare("tauInt32Ty")) {
hotPathSSA1::varVersionWithPathsSet::iterator start = tauArguments[&*ci].begin();
hotPathSSA1::varVersionWithPathsSet::iterator end = tauArguments[&*ci].end();
Value* specArgs = start->second; // specArgs points to %add
ICmpInst* int1_cmp_56 = new ICmpInst(next, ICmpInst::ICMP_NE, ci, specArgs, "cmp3");
}
}
}
}
I have not encountered such a problem jet but I think your problem is the if statement. %add belonges to the if.then BasicBlock and it is not accessable from the if.end block. This is why the phi instruction "chooses" which value is available %add or %sub. So you have to take %a.0 for your IcmpInst as argument not %add.

How to create an array of classes types?

I have a single class "Base", and a few tens of classes derived from Base. I would like to have a method that creates me the right class by an index. Like this:
class Base
{
};
class A : public Base
{
}
class B : public Base
{
}
class C : public Base
{
}
Type array = { A, B, C };
and then I could do new array[i];
How could this be achieved with C++(0x)? Usually I would use an the Abstract Factory Pattern. But since I have a LOT of derived classes, this would really slow down the program.
Since the derived classes will be used only once I also taught to use this:
Base *array = { new A, new B, new C };
But this would lead to huge memory consumption, not counting that not every class will always be used.
Any suggestion?
You cannot use an array of classes, but you can use an array of pointers to functions.
typedef std::unique_ptr<Base> (*Creator)();
template <typename T>
std::unique_ptr<Base> make() { return new T{}; }
Creator const array[] = { make<A>, make<B>, make<C> };
int main() {
std::unique_ptr<Base> b = array[1]();
b->foo();
}
For those worried by the cost of creating so many template functions, here is an example:
#include <stdio.h>
struct Base { virtual void foo() const = 0; };
struct A: Base { void foo() const { printf("A"); } };
struct B: Base { void foo() const { printf("B"); } };
struct C: Base { void foo() const { printf("C"); } };
typedef Base* (*Creator)();
template <typename T>
static Base* make() { return new T{}; }
static Creator const array[] = { make<A>, make<B>, make<C> };
Base* select_array(int i) {
return array[i]();
}
Base* select_switch(int i) {
switch(i) {
case 0: return make<A>();
case 1: return make<B>();
case 2: return make<C>();
default: return 0;
}
}
LLVM/Clang generates the following output:
define %struct.Base* #select_array(int)(i32 %i) uwtable {
%1 = sext i32 %i to i64
%2 = getelementptr inbounds [3 x %struct.Base* ()*]* #array, i64 0, i64 %1
%3 = load %struct.Base* ()** %2, align 8, !tbaa !0
%4 = tail call %struct.Base* %3()
ret %struct.Base* %4
}
define noalias %struct.Base* #select_switch(int)(i32 %i) uwtable {
switch i32 %i, label %13 [
i32 0, label %1
i32 1, label %5
i32 2, label %9
]
; <label>:1 ; preds = %0
%2 = tail call noalias i8* #operator new(unsigned long)(i64 8)
%3 = bitcast i8* %2 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* #vtable for A, i64 0, i64 2) to i32 (...)**), i32 (...)*** %3, align 8
%4 = bitcast i8* %2 to %struct.Base*
br label %13
; <label>:5 ; preds = %0
%6 = tail call noalias i8* #operator new(unsigned long)(i64 8)
%7 = bitcast i8* %6 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* #vtable for B, i64 0, i64 2) to i32 (...)**), i32 (...)*** %7, align 8
%8 = bitcast i8* %6 to %struct.Base*
br label %13
; <label>:9 ; preds = %0
%10 = tail call noalias i8* #operator new(unsigned long)(i64 8)
%11 = bitcast i8* %10 to i32 (...)***
store i32 (...)** bitcast (i8** getelementptr inbounds ([3 x i8*]* #vtable for C, i64 0, i64 2) to i32 (...)**), i32 (...)*** %11, align 8
%12 = bitcast i8* %10 to %struct.Base*
br label %13
; <label>:13 ; preds = %9, %5, %1, %0
%.0 = phi %struct.Base* [ %12, %9 ], [ %8, %5 ], [ %4, %1 ], [ null, %0 ]
ret %struct.Base* %.0
}
Unfortunately, it is not quite intelligent enough to automatically inline the functions with a regular array code (known issue with the LLVM optimizer, I don't know if gcc does better)... but using switch it is indeed possible.
typedef Base* BaseMaker();
template <class X> Base* make() {
return new X;
}
BaseMaker* makers[] = { make<A>, make<B>, make<C> };
Base* b = makers[2]();

Do modern C++ compilers inline functions which are called exactly once?

As in, say my header file is:
class A
{
void Complicated();
}
And my source file
void A::Complicated()
{
...really long function...
}
Can I split the source file into
void DoInitialStuff(pass necessary vars by ref or value)
{
...
}
void HandleCaseA(pass necessary vars by ref or value)
{
...
}
void HandleCaseB(pass necessary vars by ref or value)
{
...
}
void FinishUp(pass necessary vars by ref or value)
{
...
}
void A::Complicated()
{
...
DoInitialStuff(...);
switch ...
HandleCaseA(...)
HandleCaseB(...)
...
FinishUp(...)
}
Entirely for readability and without any fear of impact in terms of performance?
You should mark the functions static so that the compiler know they are local to that translation unit.
Without static the compiler cannot assume (barring LTO / WPA) that the function is only called once, so is less likely to inline it.
Demonstration using the LLVM Try Out page.
That said, code for readability first, micro-optimizations (and such tweaking is a micro-optimization) should only come after performance measures.
Example:
#include <cstdio>
static void foo(int i) {
int m = i % 3;
printf("%d %d", i, m);
}
int main(int argc, char* argv[]) {
for (int i = 0; i != argc; ++i) {
foo(i);
}
}
Produces with static:
; ModuleID = '/tmp/webcompile/_27689_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"
#.str = private constant [6 x i8] c"%d %d\00" ; <[6 x i8]*> [#uses=1]
define i32 #main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
%cmp4 = icmp eq i32 %argc, 0 ; <i1> [#uses=1]
br i1 %cmp4, label %for.end, label %for.body
for.body: ; preds = %for.body, %entry
%0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3]
%rem.i = srem i32 %0, 3 ; <i32> [#uses=1]
%call.i = tail call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([6 x i8]* #.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0]
%inc = add nsw i32 %0, 1 ; <i32> [#uses=2]
%exitcond = icmp eq i32 %inc, %argc ; <i1> [#uses=1]
br i1 %exitcond, label %for.end, label %for.body
for.end: ; preds = %for.body, %entry
ret i32 0
}
declare i32 #printf(i8* nocapture, ...) nounwind
Without static:
; ModuleID = '/tmp/webcompile/_27859_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"
#.str = private constant [6 x i8] c"%d %d\00" ; <[6 x i8]*> [#uses=1]
define void #foo(int)(i32 %i) nounwind {
entry:
%rem = srem i32 %i, 3 ; <i32> [#uses=1]
%call = tail call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([6 x i8]* #.str, i64 0, i64 0), i32 %i, i32 %rem) ; <i32> [#uses=0]
ret void
}
declare i32 #printf(i8* nocapture, ...) nounwind
define i32 #main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
%cmp4 = icmp eq i32 %argc, 0 ; <i1> [#uses=1]
br i1 %cmp4, label %for.end, label %for.body
for.body: ; preds = %for.body, %entry
%0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3]
%rem.i = srem i32 %0, 3 ; <i32> [#uses=1]
%call.i = tail call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([6 x i8]* #.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0]
%inc = add nsw i32 %0, 1 ; <i32> [#uses=2]
%exitcond = icmp eq i32 %inc, %argc ; <i1> [#uses=1]
br i1 %exitcond, label %for.end, label %for.body
for.end: ; preds = %for.body, %entry
ret i32 0
}
Depends on aliasing (pointers to that function) and function length (a large function inlined in a branch could throw the other branch out of cache, thus hurting performance).
Let the compiler worry about that, you worry about your code :)
A complicated function is likely to have its speed dominated by the operations within the function; the overhead of a function call won't be noticeable even if it isn't inlined.
You don't have much control over the inlining of a function, the best way to know is to try it and find out.
A compiler's optimizer might be more effective with shorter pieces of code, so you might find it getting faster even if it's not inlined.
If you split up your code into logical groupings the compiler will do what it deems best: If it's short and easy, the compiler should inline it and the result is the same. If however the code is complicated, making an extra function call might actually be faster than doing all the work inlined, so you leave the compiler the option to do that too. On top of all that, the logically split code can be far easier for a maintainer to grok and avoid future bugs.
I suggest you create a helper class to break your complicated function into method calls, much like you were proposing, but without the long, boring and unreadable task of passing arguments to each and every one of these smaller functions. Pass these arguments only once by making them member variables of the helper class.
Don't focus on optimization at this point, make sure your code is readable and you'll be fine 99% of the time.