LLVM Metadata reference instruction - llvm

I want to a create an MDNode that references another instruction.
MDNode *mdNode = MDNode::get(ctx, llvm::LocalAsMetadata::get(decider)); // decider is an instruction
phi->setMetadata("carry", mdNode); // phi is an instruction
Unfortunately the verifier fails with "Invalid operand for global metadata!" I'm starting to think that this is not possible with the current Metadata API (it seems it might have been handled in the past). Any thoughts?

Related

How to get FLOPS in RISC-V using SW or HW method?

I am a newbie to RISC-V. I wonder how I could get FLOPS using SW or HW method. I try to use CSR to get FLOPS, but there are some problems.
As I know, if I redesign the hpmcounter which counts every floating operation event, I could get FLOPS by using the csr read instruction. I know there is a similar design in the rocket-chip-based SiFive's U54-core manual. In the manual I can see SiFive core has sophisticated feature counting capabilities. This feature is controlled by the mhpmevent CSR. If I set lower eight bits of mhpmevent as 0, and enable the [19-25] bit, I can get counter value from mhpmcounter. I actually want to design this field like SiFive core.
I try to imitate it for FLOPS, but I encounter some problems.
I can't access to the mhpmcounter, and I can see the illegal instruction error like following link.
illegal instruction error message!!
I make a simple test code and compile it successfully, but there is a illegal instruction error when I implement it using spike and cycle accurate emulator. Both use proxy kernel.
// simple test code
unsigned long instret1 = 0;
unsigned long instret2 = 0;
float a,b,c;
a = 5.0;
b = 4.0;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret1));
c = a + b;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret2));
printf("instruction count : %ul \n", instret2-instret1);
It is hard to change to M-mode from user mode for access to the mhpmevet and mhpmcounter. In the RISC-V priv-spec 1.10, I find xRET instruction can change mode. Following text is about xRET in the spec.
The MRET, SRET, or URET instructions are used to return from traps in M-mode, S-mode, or
U-mode respectively. When executing an xRET instruction, supposing xPP holds the value y, x IE
is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if
user-mode is not supported).
If someone knows it, I hope to see the detailed assembly code.
I try to modify rocket-chip/src/main/scala/rocket/CSR.scala for redesign CSR. Is it the only way? Firstly, I want to use spike to test the counter value. How should I change the code?
If anybody has some other ideas or has accomplished it, please point to me. Thanks!

Are MachineBasicBlocks supposed to implicitly fall through to their successors?

I'm debugging an LLVM target backend, and I am chasing a problem where a certain basic block ends up jumping to "nothing", i.e. just after the end of the function, when compiled with optimizations turned on.
One thing I noticed is that after instruction selection, the machine basic block has a successor but no instruction to actually jump there:
BB#1: derived from LLVM BB %switch.lookup
Predecessors according to CFG: BB#0
%vreg5<def> = SEXT %vreg2, %SREG<imp-def,dead>; DLDREGS:%vreg5 GPR8:%vreg2
%vreg6<def,tied1> = ANDIWRdK %vreg5<tied0>, -2, %SREG<imp-def,dead>; DLDREGS:%vreg6,%vreg5
%vreg7<def> = LDIWRdK 4; DLDREGS:%vreg7
%vreg8<def> = LDIRdK 0; LD8:%vreg8
%vreg9<def> = LDIRdK 1; LD8:%vreg9
CPWRdRr %vreg6<kill>, %vreg7<kill>, %SREG<imp-def>; DLDREGS:%vreg6,%vreg7
%vreg0<def> = Select8 %vreg9<kill>, %vreg8<kill>, 1, %SREG<imp-use>; GPR8:%vreg0 LD8:%vreg9,%vreg8
Successors according to CFG: BB#2(?%)
I see similar ISel results from the x86 LLVM backend and the end result doesn't have a jump-to-nothingness, so I assume this, on its own, is not a problem:
BB#1: derived from LLVM BB %switch.lookup
Predecessors according to CFG: BB#0
%vreg7<def> = MOVSX32rr8 %vreg3; GR32:%vreg7 GR8:%vreg3
%vreg8<def,tied1> = AND32ri %vreg7<tied0>, 65534, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg7
%vreg9<def,tied1> = SUB32ri8 %vreg8<tied0>, 4, %EFLAGS<imp-def>; GR32:%vreg9,%vreg8
%vreg0<def> = SETNEr %EFLAGS<imp-use>; GR8:%vreg0
Successors according to CFG: BB#2(?%)
So my question is: What is the mechanism by which these CFG-specified successors are supposed to be turned into real jumps? Does the x86 backend implement something special for this to work that the backend I'm debuggig doesn't?
Should I change my ISelLowering class to lower Select8 into something that ends with an explicit jump, or is that unnecessary (maybe potentially even detrimental for some optimization to kick in) and there's some other magic that I need to do so that these implicit successors are correctly lowered?
It is perfectly valid for a MachineBasicBlock to fall through to the next Block:
That is valid. Passes that want to reorder basic blocks should only do
so if the AnalyzeBranch and related target hooks (Insert/Remove) allow
it.

How to interpret results from kcachegrind

Could anyone tell me how to interest the results from kcachegrind.
I had two versions of my code (v1, v2) both compiled in debug mode. I ran them through valgrind with options:
valgrind --tool=callgrind -v ....
The output files thus generated are opened in kcachegrind. Now I already found the version v2 of the code runs more faster than first version, v1 as it meant to be. But how do i inperet a result from kcachegrind's call graph.
In kcachegrind All Callers tab, I have the following columns: Incl. , Distance, Called, Caller.
IIUC, Called and caller are the no of times the 'caller' was called in the program. But I dont know about others.
Another thing is when selecting a particular function and then
the 'callers' tab it shows some more information. Ir, Ir per call, count, caller
and in the types tab: `EventType, Incl. Self, short, Formula.
I dont have any idea here.
So far I had read these questions:
KCachegrind interpretation confusion
Confused about profiling result
I use QCacheGrind, so I apologize if something on my screen isn't quite the same as what you see. From what I understand, QCacheGrind is a direct Qt port of KCacheGrind. Additionally, I have the ability to toggle between an Instruction Count and a % of total instructions. For consistency I will refer to the Instruction Count view on any column that can be toggled in this way.
The "All Callers" tab columns should represent the following:
Incl.: The number of instructions that this function generated as a whole broken down by each caller. Because callers are a hierarchy (hence the distance column) there may be several that have the same value if your call stack is deep.
Distance: How many function calls separated is the selected line from the function that is selected in the Flat Profile panel.
Called: The number of time the Caller called the a function that ultimately led to the execution of the selected function).
Caller: The function that directly called or called another caller of your selected function (as determined by Distance).
The Callers tab is more straightforward. It shows the functions that have a distance of 1 from your selected function. In other words, these are the functions that directly invoke your selected function.
Ir: The number of instructions executed in total by the selected function after being called by this caller.
Ir per call: The number of instructions executed per call.
Count: The number of times the selected function was called by the caller.
Caller: The function that directly called the selected function.
For Events, see this page for the handbook. I suspect that if you didn't define your own types all you should see is "Instruction Fetch" and possibly "Cycle Estimation." The quick breakdown on these columns is as follows:
Incl.: Again the total instructions performed by this function and all functions it calls beneath it.
Self: The instructions performed exclusively by this function. This counter only tracks instructions used by this function, not any instruction used by functions that are called by this function.
Short and Formula: These columns are used when defining a custom Event Type. Yours should either be blank or very short (like CEst = Ir) unless you end up defining your own Types.

casting using llvm::dyn_cast causing in error

I'm trying to iterate over uses of operand of store inst. I follow the programmer's manual to do this but I get error.
//x is store instruction pointing to [store i32 5, i32* %a, align 4]
Value *op2 = x->getOperand(1);
for (Value::use_iterator useItr=op2->use_begin(),useEnd=op2->use_end(); useItr!=useEnd;useItr++){
if (Instruction *Inst = dyn_cast_or_null<Instruction>(*useItr))
errs()<<"done";
}
I get this error message:
IR/Use.h:204: UserTy *llvm::value_use_iterator::operator*() const [UserTy = llvm::User]: Assertion `U && "Cannot dereference end iterator!"' failed.
In my understanding, If casting is not possible dyn_cast should return a null pointer not an error. I also tried dyn_cast_or_null, but same error.
Well, the problem seems to be with latest llvm code as I used the svn checkout for latest code. I finally took 3.4 stable release and everything works fine now.
Your error is from dereferencing useItr, not from the dyn_cast. Your code looks fine to me, so I took a look at the implementation of value_use_iterator, and the only way I can think of for your error to occur is if one of the value's uses is NULL.
A use of NULL is not something that can occur in a legal module, though. So I recommend running the verifier pass on your module before your code to see if it can spot the problem - otherwise you'll have to carefully examine the module yourself.

llvm alloca dependencies

I am trying to determine for certain Load instructions from my pass their corresponding Alloca instructions (that can be in other previous blocks). The chain can be something like : TargetLoad(var) -> other stores/loads that use var (or dependencies on var) -> alloca(var). , linked on several basic blocks. Do you know how can I do it?
I tried to use the methods from DependenceAnalysis and MemoryDependenceAnalysis, but the results were not correct. For instance, MemoryDependenceAnalysis::getDependency should be good with option "Def", but works only for stores, not for loads. Also I have a segfault when trying to use MemoryDependenceAnalysis::getNonLocalPointerDependency or MemoryDependenceAnalysis::getPointerDependencyFrom . When I try to check my result using MemDepResult::getDef(), the result for Load instructions is the same instruction ! So its depending on itself, that being weird since it is using a variable that is previously defined in the code.
The alternative of making the intersection for identifying common parts between all the variables used by target_load_instructions and all the allocated variables is not an option. Because there might be something like : alloca(a) ... c=a*b+4 .... load(c).
It seems also that DependenceAnalysis::depends() is not ok for my pass. The next line of code is only for reference: if(DA.depends(allocaInstrArray[i],loadInstrArray[j],true)) is always false. And it should be true in several cases. I think I am not using it correctly.
However, I made the assumption that maybe depends() does not work for Alloca. So I checked the dependencies among all Load instructions kept in an array. Some results are not based on the loaded variable as they should. For example: LOAD %3 = load i32* %c, align 4 IS DEPENDENT ON %1 = load i32* %j, align 4. As you can see, one is loading c and one is loading j. In my Test.cpp target code there is no dependence between j and c. Maybe the dependence is not based on variables/memory locations used?
Thank you for any suggestion !
First, use getOperand(0) or getOperand(1) of the ICMP instructions. If there is isa<LoadInst> valid, then cast them to LoadInst. getPointerOperand() will get the Value* that is the actual variable which is searched.
Second, do the same procedure between Load instructions and Alloca instructions. getOperand(0) applied on Load gives the corresponding Alloca instruction.
Finally, link the two results together, by checking the dependencies.