Using temporary arrays to cut down on code - inefficient? - c++

I'm new to c++ (and SO) so sorry if this is obvious.
I've started using temporary arrays in my code to cut down on repetition and to make it easier to do the same thing to multiple objects. So instead of:
MyObject obj1, obj2, obj3, obj4;
obj1.doSomming(arg);
obj2.doSomming(arg);
obj3.doSomming(arg);
obj4.doSomming(arg);
I'm doing:
MyObject obj1, obj2, obj3, obj4;
MyObject* objs[] = {&obj1, &obj2, &obj3, &obj4};
for (int i = 0; i !=4; ++i)
objs[i]->doSomming(arg);
Is this detrimental to performance? Like, does it cause unnecessary memory allocation? Is it good practice? Thanks.

In general you just shouldn't worry about performance at this level. Very often the things that end up being performance problems turn out to be completely different from what you might expect, especially if you don't have a lot of experience with performance optimization.
You should always think about writing clear code first, and if performance matters then you should think about it in algorithmic terms (i.e., big-O). Then you should measure performance and let that guide where you spend your effort at optimization.
Now, you can make the code even clearer and more straightforward if you avoid the intermediate array and just use an array for the original objects:
MyObject obj[4];
for (int i = 0; i !=4; ++i)
objs[i].doSomming(arg);
But no, an optimizing compiler should generally have no problem with this.
For example, if I take the code:
struct MyObject {
void doSomming() {
std::printf("Hello\n");
}
};
void foo1() {
MyObject obj1, obj2, obj3, obj4;
obj1.doSomming();
obj2.doSomming();
obj3.doSomming();
obj4.doSomming();
}
void foo2() {
MyObject obj1, obj2, obj3, obj4;
MyObject* objs[] = {&obj1, &obj2, &obj3, &obj4};
for (int i = 0; i !=4; ++i)
objs[i]->doSomming();
}
void foo3() {
MyObject obj[4];
for (int i = 0; i !=4; ++i)
obj[i].doSomming();
}
and produce LLVM IR (because it's more compact than actual assembly), I get the following with -O3.
define void #_Z4foo1v() nounwind uwtable ssp {
entry:
%puts.i = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i1 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i2 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i3 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
ret void
}
define void #_Z4foo2v() nounwind uwtable ssp {
entry:
%puts.i = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i.1 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i.2 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i.3 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
ret void
}
define void #_Z4foo3v() nounwind uwtable ssp {
entry:
%puts.i = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i.1 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i.2 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%puts.i.3 = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
ret void
}
At -O3 the loop gets unrolled and the code is identical with the original version. With -Os the loops don't get unrolled, but the the pointer indirection and even the arrays disappear because they're not needed after inlining:
define void #_Z4foo2v() nounwind uwtable optsize ssp {
entry:
br label %for.body
for.body: ; preds = %entry, %for.body
%i.05 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%puts.i = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%inc = add nsw i32 %i.05, 1
%cmp = icmp eq i32 %inc, 4
br i1 %cmp, label %for.end, label %for.body
for.end: ; preds = %for.body
ret void
}
define void #_Z4foo3v() nounwind uwtable optsize ssp {
entry:
br label %for.body
for.body: ; preds = %entry, %for.body
%i.03 = phi i32 [ 0, %entry ], [ %inc, %for.body ]
%puts.i = tail call i32 #puts(i8* getelementptr inbounds ([6 x i8]* #str, i64 0, i64 0)) nounwind
%inc = add nsw i32 %i.03, 1
%cmp = icmp eq i32 %inc, 4
br i1 %cmp, label %for.end, label %for.body
for.end: ; preds = %for.body
ret void
}

Related

LLVM asserts "Resolving symbol with incorrect flags"

I'm following the example of Kaleidoscope to write a minimum IR file interpreter. It takes one command line argument, which is a path to .ll file, and executes the main function in the file. But when I tested it on an IR file, it failed with:
Assertion failed: (KV.second.getFlags() & ~WeakFlags) == (I->second & ~WeakFlags) && "Resolving symbol with incorrect flags", file <path>\llvm\lib\ExecutionEngine\Orc\Core.cpp, line 2775
Considering the simplicity of my code (which only has 63 lines), I can't figure out what's wrong in it. Please help 😭!!!
Full Source Code
#include <iostream>
#include <llvm/IR/DataLayout.h>
#include <llvm/IR/LLVMContext.h>
#include <llvm/IR/Module.h>
#include <llvm/IRReader/IRReader.h>
#include <llvm/Support/SourceMgr.h>
#include <llvm/Support/TargetSelect.h>
#include <llvm/ExecutionEngine/Orc/CompileUtils.h>
#include <llvm/ExecutionEngine/Orc/Core.h>
#include <llvm/ExecutionEngine/Orc/ExecutionUtils.h>
#include <llvm/ExecutionEngine/Orc/ExecutorProcessControl.h>
#include <llvm/ExecutionEngine/Orc/IRCompileLayer.h>
#include <llvm/ExecutionEngine/Orc/JITTargetMachineBuilder.h>
#include <llvm/ExecutionEngine/Orc/RTDyldObjectLinkingLayer.h>
#include <llvm/ExecutionEngine/SectionMemoryManager.h>
using namespace llvm;
int
main(int argc, char* argv[])
{
InitializeNativeTarget();
InitializeNativeTargetAsmPrinter();
InitializeNativeTargetAsmParser();
orc::ThreadSafeContext tsctx(std::make_unique<LLVMContext>());
SMDiagnostic error;
auto mod = parseIRFile(argv[1], error, *tsctx.getContext());
auto epc = orc::SelfExecutorProcessControl::Create();
cantFail(epc.takeError());
orc::ExecutionSession es(std::move(*epc));
auto triple = es.getExecutorProcessControl().getTargetTriple();
orc::JITTargetMachineBuilder jtmb(triple);
auto dl = jtmb.getDefaultDataLayoutForTarget();
cantFail(dl.takeError());
orc::RTDyldObjectLinkingLayer ol(
es, []() { return std::make_unique<SectionMemoryManager>(); });
orc::IRCompileLayer cl(
es, ol, std::make_unique<orc::ConcurrentIRCompiler>(std::move(jtmb)));
auto& jd = es.createBareJITDylib("jd");
jd.addGenerator(
cantFail(orc::DynamicLibrarySearchGenerator::GetForCurrentProcess(
dl->getGlobalPrefix())));
cantFail(cl.add(jd, orc::ThreadSafeModule(std::move(mod), tsctx)));
orc::MangleAndInterner mangle(es, *dl);
auto f = es.lookup({ &jd }, mangle("main"));
cantFail(f.takeError());
return reinterpret_cast<int (*)()>(f->getAddress())();
}
Test .ll file
; ModuleID = 'sum.ll'
source_filename = "sum.c"
target datalayout = "e-m:w-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-pc-windows-msvc19.29.30133"
%struct._iobuf = type { i8* }
%struct.__crt_locale_pointers = type { %struct.__crt_locale_data*, %struct.__crt_multibyte_data* }
%struct.__crt_locale_data = type opaque
%struct.__crt_multibyte_data = type opaque
$scanf = comdat any
$__local_stdio_scanf_options = comdat any
$"??_C#_02DPKJAMEF#?$CFd?$AA#" = comdat any
#"??_C#_02DPKJAMEF#?$CFd?$AA#" = linkonce_odr dso_local unnamed_addr constant [3 x i8] c"%d\00", comdat, align 1
#__local_stdio_scanf_options._OptionsStorage = internal global i64 0, align 8
; Function Attrs: nounwind uwtable
define dso_local i32 #main() local_unnamed_addr #0 {
%1 = alloca i32, align 4
%2 = bitcast i32* %1 to i8*
call void #llvm.lifetime.start.p0i8(i64 4, i8* nonnull %2) #6
%3 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%4 = load i32, i32* %1, align 4, !tbaa !4
%5 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%6 = load i32, i32* %1, align 4, !tbaa !4
%7 = add nsw i32 %6, %4
%8 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%9 = load i32, i32* %1, align 4, !tbaa !4
%10 = add nsw i32 %7, %9
%11 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%12 = load i32, i32* %1, align 4, !tbaa !4
%13 = add nsw i32 %10, %12
%14 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%15 = load i32, i32* %1, align 4, !tbaa !4
%16 = add nsw i32 %13, %15
%17 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%18 = load i32, i32* %1, align 4, !tbaa !4
%19 = add nsw i32 %16, %18
%20 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%21 = load i32, i32* %1, align 4, !tbaa !4
%22 = add nsw i32 %19, %21
%23 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%24 = load i32, i32* %1, align 4, !tbaa !4
%25 = add nsw i32 %22, %24
%26 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%27 = load i32, i32* %1, align 4, !tbaa !4
%28 = add nsw i32 %25, %27
%29 = call i32 (i8*, ...) #scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* #"??_C#_02DPKJAMEF#?$CFd?$AA#", i64 0, i64 0), i32* nonnull %1)
%30 = load i32, i32* %1, align 4, !tbaa !4
%31 = add nsw i32 %28, %30
call void #llvm.lifetime.end.p0i8(i64 4, i8* nonnull %2) #6
ret i32 %31
}
; Function Attrs: argmemonly mustprogress nofree nosync nounwind willreturn
declare void #llvm.lifetime.start.p0i8(i64 immarg, i8* nocapture) #1
; Function Attrs: inlinehint nobuiltin nounwind uwtable
define linkonce_odr dso_local i32 #scanf(i8* %0, ...) local_unnamed_addr #2 comdat {
%2 = alloca i8*, align 8
%3 = bitcast i8** %2 to i8*
call void #llvm.lifetime.start.p0i8(i64 8, i8* nonnull %3) #6
call void #llvm.va_start(i8* nonnull %3)
%4 = load i8*, i8** %2, align 8, !tbaa !8
%5 = call %struct._iobuf* #__acrt_iob_func(i32 0) #6
%6 = call i64* #__local_stdio_scanf_options() #6
%7 = load i64, i64* %6, align 8, !tbaa !10
%8 = call i32 #__stdio_common_vfscanf(i64 %7, %struct._iobuf* %5, i8* %0, %struct.__crt_locale_pointers* null, i8* %4) #6
call void #llvm.va_end(i8* nonnull %3)
call void #llvm.lifetime.end.p0i8(i64 8, i8* nonnull %3) #6
ret i32 %8
}
; Function Attrs: argmemonly mustprogress nofree nosync nounwind willreturn
declare void #llvm.lifetime.end.p0i8(i64 immarg, i8* nocapture) #1
; Function Attrs: mustprogress nofree nosync nounwind willreturn
declare void #llvm.va_start(i8*) #3
; Function Attrs: mustprogress nofree nosync nounwind willreturn
declare void #llvm.va_end(i8*) #3
declare dso_local %struct._iobuf* #__acrt_iob_func(i32) local_unnamed_addr #4
declare dso_local i32 #__stdio_common_vfscanf(i64, %struct._iobuf*, i8*, %struct.__crt_locale_pointers*, i8*) local_unnamed_addr #4
; Function Attrs: noinline nounwind uwtable
define linkonce_odr dso_local i64* #__local_stdio_scanf_options() local_unnamed_addr #5 comdat {
ret i64* #__local_stdio_scanf_options._OptionsStorage
}
attributes #0 = { nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #1 = { argmemonly mustprogress nofree nosync nounwind willreturn }
attributes #2 = { inlinehint nobuiltin nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #3 = { mustprogress nofree nosync nounwind willreturn }
attributes #4 = { "frame-pointer"="none" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #5 = { noinline nounwind uwtable "frame-pointer"="none" "min-legal-vector-width"="0" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" }
attributes #6 = { nounwind }
!llvm.module.flags = !{!0, !1, !2}
!llvm.ident = !{!3}
!0 = !{i32 1, !"wchar_size", i32 2}
!1 = !{i32 7, !"PIC Level", i32 2}
!2 = !{i32 7, !"uwtable", i32 1}
!3 = !{!"clang version 13.0.1"}
!4 = !{!5, !5, i64 0}
!5 = !{!"int", !6, i64 0}
!6 = !{!"omnipotent char", !7, i64 0}
!7 = !{!"Simple C/C++ TBAA"}
!8 = !{!9, !9, i64 0}
!9 = !{!"any pointer", !6, i64 0}
!10 = !{!11, !11, i64 0}
!11 = !{!"long long", !6, i64 0}
Which is compiled from:
#include <stdio.h>
int
main()
{
int n, sum = 0;
for (int i = 0; i < 10; ++i) {
scanf("%d", &n);
sum += n;
}
return sum;
}
Well, it turns out that THERE IS NOTHING WRONG WITH MY CODE!
I compile and test the same code in Linux environment (WSL2), and everything works fine. I'm pretty sure that this is somewhat compatibility problem between Linux and Windows.
Maybe this is a bug of LLVM?

LLVM getelementptr indices use/meaning

I just started learning LLVM and I am wondering why we have two indices in getelementptr? what are the first and second indices (0 and 0) used for?
#tmp = global [18 x i8] c"Hello world!: %d\0A\00"
declare i32 #printf(i8* %0, ...)
define i32 #fact(i32 %x) {
0:
%1 = icmp sle i32 %x, 0
br i1 %1, label %2, label %3
2:
ret i32 1
3:
%4 = sub i32 %x, 1
%5 = call i32 #fact(i32 %4)
%6 = mul i32 %x, %5
ret i32 %6
}
define i32 #main() {
entry:
%0 = getelementptr [18 x i8], [18 x i8]* #tmp, i32 0, i32 0 ; <---- HERE
%1 = call i32 #fact(i32 23)
%2 = call i32 (i8*, ...) #printf(i8* %0, i32 %1)
ret i32 1
}
enter code here

Executing LLVM code results with Segmentation fault

I have the following code:
#.str_specifier = constant [4 x i8] c"%s\0A\00"
#.int_specifier = constant [4 x i8] c"%d\0A\00"
#.string_var1 = constant [2 x i8] c"f\00"
#.string_var2 = constant [6 x i8] c"Error\00"
; >>> Start Program
declare i32 #printf(i8*, ...)
declare void #exit(i32)
define void #print(i8*) {
call i32 (i8*, ...) #printf(i8* getelementptr ([4 x i8], [4 x i8]* #.str_specifier, i32 0, i32 0), i8* %0)
ret void
}
define void #printi(i32) {
call i32 (i8*, ...) #printf(i8* getelementptr ([4 x i8], [4 x i8]* #.int_specifier, i32 0, i32 0), i32 %0)
ret void
}
declare i8* #malloc(i32)
declare void #free(i8*)
declare void #llvm.memcpy.p0i8.p0i8.i32(i8*, i8*, i32, i1)
define void #main()
{ ; >>> Adding function scope
%funcArgs1 = alloca [50 x i32]
; >>> Adding function arguments allocation
; >>> Function body of main
call void #print(i8* getelementptr ([2 x i8], [2 x i8]* #.string_var1, i32 0, i32 0))
%register1 = call i8* #malloc(i32 48)
%register2 = bitcast i8* %register1 to i32*
%register3 = getelementptr inbounds [50 x i32], [50 x i32]* %funcArgs1, i32 0, i32 0
%register4 = ptrtoint i32* %register2 to i32
store i32 %register4, i32* %register3
%register5 = getelementptr inbounds i32, i32* %register2, i32 0
%register6 = add i32 0, 12
store i32 %register6, i32* %register5
%register7 = getelementptr inbounds i32, i32* %register2, i32 1
%register8 = add i32 0, 2
store i32 %register8, i32* %register7
%register9 = getelementptr inbounds i32, i32* %register2, i32 2
store i32 0, i32* %register9
%register10 = getelementptr inbounds i32, i32* %register2, i32 3
store i32 0, i32* %register10
%register11 = getelementptr inbounds i32, i32* %register2, i32 4
store i32 0, i32* %register11
%register12 = getelementptr inbounds i32, i32* %register2, i32 5
store i32 0, i32* %register12
%register13 = getelementptr inbounds i32, i32* %register2, i32 6
store i32 0, i32* %register13
%register14 = getelementptr inbounds i32, i32* %register2, i32 7
store i32 0, i32* %register14
%register15 = getelementptr inbounds i32, i32* %register2, i32 8
store i32 0, i32* %register15
%register16 = getelementptr inbounds i32, i32* %register2, i32 9
store i32 0, i32* %register16
%register17 = getelementptr inbounds i32, i32* %register2, i32 10
store i32 0, i32* %register17
%register18 = getelementptr inbounds i32, i32* %register2, i32 11
store i32 0, i32* %register18
%register19 = load i32, i32* %register3 ; Get variable x
%register20 = add i32 0, 2
%register21 = inttoptr i32 %register20 to i32*
%register22 = getelementptr inbounds i32, i32* %register21, i32 1
%register23 = load i32, i32* %register22
%register24 = getelementptr inbounds i32, i32* %register21, i32 0
%register25 = load i32, i32* %register24
%register26 = add i32 %register23, %register25
%register27 = sub i32 %register26, 4
%register28 = icmp sgt i32 %register20, %register27
br i1 %register28, label %label1, label %label_cont1
label_cont1:
br label %label2
label1:
call void #print(i8* getelementptr ([6 x i8], [6 x i8]* #.string_var2, i32 0, i32 0))
call void #exit(i32 1)
%register200 = add i32 0, 2
br label %label2
label2:
ret void
} ; >>> Closing function scope
For some reason when I run it, it fails with Segmentation fault (core dumped) without printing an understandable error. The strange thing is if I comment the commands in label1 and keep it:
;call void #print(i8* getelementptr ([6 x i8], [6 x i8]* #.string_var2, i32 0, i32 0))
;call void #exit(i32 1)
;%register200 = add i32 0, 2
br label %label2
It does not result with Segmentation fault. If I comment out at least one of those commands (for example print or the sum), it will fail. Why does it happen?
EDIT: I think I'm getting the same result here. (Here with comments)
I understand that "Segmentation fault" means that I tried to access memory that
I do not have access to. but why can't I even create an new register with some value?
EDIT2: It looks like br i1 %register28, label %label1, label %label_cont1 is the real reason.
Edit3: The actual full code I'm trying to figure can be found here. The problem is that changing it to alloca i32 will result with Error (instead of printing 1). It also contains the C code I'm trying to copy to LLVM.
The segfault originates from this line
%register21 = inttoptr i32 %register20 to i32*
After the cast, register21 supposedly points to some memory location. But what memory location ?? It's value is a non existent address that wasn't gotten through a an alloca instr or malloc call.
Therefore all the other registers that try to dereference this pointer get disappointed.
I've altered the inttptr line

Create local string using LLVM

I'm trying to create a local variable using LLVM to store strings, but my code is currently throwing a syntax error.
lli: test2.ll:8:23: error: constant expression type mismatch
%1 = load [6 x i8]* c"hello\00"
My IR code that allocates and store the string:
#.string = private constant [4 x i8] c"%s\0A\00"
define void #main() {
entry:
%a = alloca [255 x i8]
%0 = bitcast [255 x i8]* %a to i8*
%1 = load [6 x i8]* c"hello\00"
%2 = bitcast [6 x i8]* %1 to i8*
%3 = tail call i8* #strncpy(i8* %0, i8* %2, i64 255) nounwind
%4 = getelementptr inbounds [6 x i8]* %a, i32 0, i32 0
%5 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([4 x i8]* #.string, i32 0, i32 0), i8* %4)
ret void
}
declare i32 #printf(i8*, ...)
declare i8* #strncpy(i8*, i8* nocapture, i64) nounwind
Using llc I could see that the way llvm implements is allocating and assigning to a global variable, but I want it to be local (inside a basic block). The code below works, but I don't want to create this var "#.str"...
#str = global [1024 x i8] zeroinitializer, align 16
#.str = private unnamed_addr constant [6 x i8] c"hello\00", align 1
#.string = private constant [4 x i8] c"%s\0A\00"
define i32 #main() nounwind uwtable {
%1 = tail call i8* #strncpy(i8* getelementptr inbounds ([1024 x i8]* #str, i64 0, i64 0), i8* getelementptr inbounds ([6 x i8]* #.str, i64 0, i64 0), i64 1024) nounwind
%2 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([4 x i8]* #.string, i32 0, i32 0), i8* %1)
ret i32 0
}
declare i8* #strncpy(i8*, i8* nocapture, i64) nounwind
declare i32 #printf(i8*, ...) #2
Thanks
I figured out by myself after messing more with my previous code.
Below is the code, so people who had the same problem as I had can check
#.string = private constant [4 x i8] c"%s\0A\00"
define void #main() {
entry:
%a = alloca [6 x i8]
store [6 x i8] [i8 104,i8 101,i8 108,i8 108, i8 111, i8 0], [6 x i8]* %a
%0 = bitcast [6 x i8]* %a to i8*
%1 = call i32 (i8*, ...)* #printf(i8* getelementptr inbounds ([4 x i8]* #.string, i32 0, i32 0), i8* %0)
ret void
}
declare i32 #printf(i8*, ...)
Basically, I had to store each of the characters individually in the array and then bitcast to i8* so I could use the printf function. I couldn't use the c" ... " method which is the one shown in LLVM webpage http://llvm.org/docs/LangRef.html#id669 . It seems it is a special case in the language specification of the IR and they required to be in the global scope.
UPDATE: I was working on the same code again and I found out that the best way was to store a constant instead of each of the i8 symbols. So the line 6, will be replaced by:
store [6 x i8] c"hello\00", [6 x i8]* %0
It is easier to generate code using llvm and it's more readable!

lli won't take kindly to rust's LLVM IR

I have the following rust code.
$ cat hello.rs
fn main() {
println!("Hello world!");
}
$ rustc hello.rs; ./hello
Hello world!
And I'm producing llvm byte code with the --emit=ir option.
$ rustc --emit=ir hello.rs
$ cat hello.ll
; ModuleID = 'hello.rs'
target datalayout = "e-i64:64-f80:128-n8:16:32:64"
target triple = "x86_64-apple-darwin"
%str_slice = type { i8*, i64 }
%"struct.core::fmt::Argument<[]>[#3]" = type { %"enum.core::result::Result<[(), core::fmt::FormatError]>[#3]" (%"enum.core::fmt::Void<[]>[#3]"*, %"struct.core::fmt::Formatter<[]>[#3]"*)*, %"enum.core::fmt::Void<[]>[#3]"* }
%"enum.core::result::Result<[(), core::fmt::FormatError]>[#3]" = type { i8, [0 x i8], [1 x i8] }
%"struct.core::fmt::Formatter<[]>[#3]" = type { i64, i32, i8, %"enum.core::option::Option<[uint]>[#3]", %"enum.core::option::Option<[uint]>[#3]", { void (i8*)**, i8* }, %"struct.core::slice::Items<[core::fmt::Argument]>[#3]", { %"struct.core::fmt::Argument<[]>[#3]"*, i64 } }
%"enum.core::option::Option<[uint]>[#3]" = type { i8, [7 x i8], [1 x i64] }
%"struct.core::slice::Items<[core::fmt::Argument]>[#3]" = type { %"struct.core::fmt::Argument<[]>[#3]"*, %"struct.core::fmt::Argument<[]>[#3]"*, %"struct.core::kinds::marker::ContravariantLifetime<[]>[#3]" }
%"struct.core::kinds::marker::ContravariantLifetime<[]>[#3]" = type {}
%"enum.core::fmt::Void<[]>[#3]" = type {}
%"struct.core::fmt::Arguments<[]>[#3]" = type { { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }, { %"struct.core::fmt::Argument<[]>[#3]"*, i64 } }
%"enum.core::fmt::rt::Piece<[]>[#3]" = type { i8, [7 x i8], [8 x i64] }
#str1364 = internal constant [12 x i8] c"Hello world!"
#_ZN4main15__STATIC_FMTSTR20h3b67a4ad8efbb398oaaE = internal unnamed_addr constant { { i8, %str_slice, [48 x i8] } } { { i8, %str_slice, [48 x i8] } { i8 0, %str_slice { i8* getelementptr inbounds ([12 x i8]* #str1364, i32 0, i32 0), i64 12 }, [48 x i8] undef } }
; Function Attrs: uwtable
define internal void #_ZN4main20he3565cca0bc2f101eaaE() unnamed_addr #0 {
entry-block:
%match = alloca {}
%__args_vec = alloca { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }
%0 = alloca %"struct.core::fmt::Argument<[]>[#3]", i64 0
%__args = alloca %"struct.core::fmt::Arguments<[]>[#3]"
%__adjust = alloca { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }
%__adjust1 = alloca { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }
br label %case_body
case_body: ; preds = %entry-block
%1 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__args_vec, i32 0, i32 0
store %"struct.core::fmt::Argument<[]>[#3]"* %0, %"struct.core::fmt::Argument<[]>[#3]"** %1
%2 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__args_vec, i32 0, i32 1
store i64 0, i64* %2
%3 = getelementptr inbounds { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* %__adjust, i32 0, i32 0
store %"enum.core::fmt::rt::Piece<[]>[#3]"* getelementptr inbounds ([1 x %"enum.core::fmt::rt::Piece<[]>[#3]"]* bitcast ({ { i8, %str_slice, [48 x i8] } }* #_ZN4main15__STATIC_FMTSTR20h3b67a4ad8efbb398oaaE to [1 x %"enum.core::fmt::rt::Piece<[]>[#3]"]*), i32 0, i32 0), %"enum.core::fmt::rt::Piece<[]>[#3]"** %3
%4 = getelementptr inbounds { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* %__adjust, i32 0, i32 1
store i64 1, i64* %4
%5 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__args_vec, i32 0, i32 0
%6 = load %"struct.core::fmt::Argument<[]>[#3]"** %5
%7 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__args_vec, i32 0, i32 1
%8 = load i64* %7
%9 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__adjust1, i32 0, i32 0
store %"struct.core::fmt::Argument<[]>[#3]"* %6, %"struct.core::fmt::Argument<[]>[#3]"** %9
%10 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__adjust1, i32 0, i32 1
store i64 %8, i64* %10
call void #"_ZN3fmt22Arguments$LT$$x27a$GT$3new20h30af698883d0f4c86aaE"(%"struct.core::fmt::Arguments<[]>[#3]"* noalias nocapture sret dereferenceable(32) %__args, { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* noalias nocapture dereferenceable(16) %__adjust, { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* noalias nocapture dereferenceable(16) %__adjust1)
call void #_ZN2io5stdio12println_args20hecac3fc58fb73442EvmE(%"struct.core::fmt::Arguments<[]>[#3]"* noalias nocapture dereferenceable(32) %__args)
br label %join
join: ; preds = %case_body
ret void
}
define i64 #main(i64, i8**) unnamed_addr #1 {
top:
%2 = call i64 #_ZN10lang_start20h7823875e69d425d0BueE(i8* bitcast (void ()* #_ZN4main20he3565cca0bc2f101eaaE to i8*), i64 %0, i8** %1)
ret i64 %2
}
declare i64 #_ZN10lang_start20h7823875e69d425d0BueE(i8*, i64, i8**) unnamed_addr #1
; Function Attrs: inlinehint uwtable
define internal void #"_ZN3fmt22Arguments$LT$$x27a$GT$3new20h30af698883d0f4c86aaE"(%"struct.core::fmt::Arguments<[]>[#3]"* noalias nocapture sret dereferenceable(32), { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* noalias nocapture dereferenceable(16), { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* noalias nocapture dereferenceable(16)) unnamed_addr #2 {
entry-block:
%__adjust = alloca { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }
%3 = getelementptr inbounds %"struct.core::fmt::Arguments<[]>[#3]"* %0, i32 0, i32 0
%4 = bitcast { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* %1 to i8*
%5 = bitcast { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* %3 to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %5, i8* %4, i64 16, i32 8, i1 false)
%6 = getelementptr inbounds %"struct.core::fmt::Arguments<[]>[#3]"* %0, i32 0, i32 1
%7 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %2, i32 0, i32 0
%8 = load %"struct.core::fmt::Argument<[]>[#3]"** %7
%9 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %2, i32 0, i32 1
%10 = load i64* %9
%11 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__adjust, i32 0, i32 0
store %"struct.core::fmt::Argument<[]>[#3]"* %8, %"struct.core::fmt::Argument<[]>[#3]"** %11
%12 = getelementptr inbounds { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__adjust, i32 0, i32 1
store i64 %10, i64* %12
%13 = bitcast { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %__adjust to i8*
%14 = bitcast { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* %6 to i8*
call void #llvm.memcpy.p0i8.p0i8.i64(i8* %14, i8* %13, i64 16, i32 8, i1 false)
ret void
}
; Function Attrs: nounwind
declare void #llvm.memcpy.p0i8.p0i8.i64(i8* nocapture, i8* nocapture readonly, i64, i32, i1) unnamed_addr #3
declare void #_ZN2io5stdio12println_args20hecac3fc58fb73442EvmE(%"struct.core::fmt::Arguments<[]>[#3]"* noalias nocapture dereferenceable(32)) unnamed_addr #1
attributes #0 = { uwtable "split-stack" }
attributes #1 = { "split-stack" }
attributes #2 = { inlinehint uwtable "split-stack" }
attributes #3 = { nounwind "split-stack" }
However, lli won't accept this bytecode.
$ lli hello.ll
lli: hello.ll:47:138: error: expected value token
call void #"_ZN3fmt22Arguments$LT$$x27a$GT$3new20h30af698883d0f4c86aaE"(%"struct.core::fmt::Arguments<[]>[#3]"* noalias nocapture sret dereferenceable(32) %__args, { %"enum.core::fmt::rt::Piece<[]>[#3]"*, i64 }* noalias nocapture dereferenceable(16) %__adjust, { %"struct.core::fmt::Argument<[]>[#3]"*, i64 }* noalias nocapture dereferenceable(16) %__adjust1)
^
Any ideas why?
The dereferenceable attribute was added to LLVM just last month (July 2014). I'm assuming the rustc you are using is based on brand new LLVM code while your lli is slightly older. To fix this, update your code and rebuild.
The dereferenceable attribute was added in a commit one month before the OP, so if you are using a released LLVM package, you may not be using a recent enough package.
Try using an LLVM package built from top of trunk.