Are MachineBasicBlocks supposed to implicitly fall through to their successors? - llvm

I'm debugging an LLVM target backend, and I am chasing a problem where a certain basic block ends up jumping to "nothing", i.e. just after the end of the function, when compiled with optimizations turned on.
One thing I noticed is that after instruction selection, the machine basic block has a successor but no instruction to actually jump there:
BB#1: derived from LLVM BB %switch.lookup
Predecessors according to CFG: BB#0
%vreg5<def> = SEXT %vreg2, %SREG<imp-def,dead>; DLDREGS:%vreg5 GPR8:%vreg2
%vreg6<def,tied1> = ANDIWRdK %vreg5<tied0>, -2, %SREG<imp-def,dead>; DLDREGS:%vreg6,%vreg5
%vreg7<def> = LDIWRdK 4; DLDREGS:%vreg7
%vreg8<def> = LDIRdK 0; LD8:%vreg8
%vreg9<def> = LDIRdK 1; LD8:%vreg9
CPWRdRr %vreg6<kill>, %vreg7<kill>, %SREG<imp-def>; DLDREGS:%vreg6,%vreg7
%vreg0<def> = Select8 %vreg9<kill>, %vreg8<kill>, 1, %SREG<imp-use>; GPR8:%vreg0 LD8:%vreg9,%vreg8
Successors according to CFG: BB#2(?%)
I see similar ISel results from the x86 LLVM backend and the end result doesn't have a jump-to-nothingness, so I assume this, on its own, is not a problem:
BB#1: derived from LLVM BB %switch.lookup
Predecessors according to CFG: BB#0
%vreg7<def> = MOVSX32rr8 %vreg3; GR32:%vreg7 GR8:%vreg3
%vreg8<def,tied1> = AND32ri %vreg7<tied0>, 65534, %EFLAGS<imp-def,dead>; GR32:%vreg8,%vreg7
%vreg9<def,tied1> = SUB32ri8 %vreg8<tied0>, 4, %EFLAGS<imp-def>; GR32:%vreg9,%vreg8
%vreg0<def> = SETNEr %EFLAGS<imp-use>; GR8:%vreg0
Successors according to CFG: BB#2(?%)
So my question is: What is the mechanism by which these CFG-specified successors are supposed to be turned into real jumps? Does the x86 backend implement something special for this to work that the backend I'm debuggig doesn't?
Should I change my ISelLowering class to lower Select8 into something that ends with an explicit jump, or is that unnecessary (maybe potentially even detrimental for some optimization to kick in) and there's some other magic that I need to do so that these implicit successors are correctly lowered?

It is perfectly valid for a MachineBasicBlock to fall through to the next Block:
That is valid. Passes that want to reorder basic blocks should only do
so if the AnalyzeBranch and related target hooks (Insert/Remove) allow
it.

Related

LLVM Metadata reference instruction

I want to a create an MDNode that references another instruction.
MDNode *mdNode = MDNode::get(ctx, llvm::LocalAsMetadata::get(decider)); // decider is an instruction
phi->setMetadata("carry", mdNode); // phi is an instruction
Unfortunately the verifier fails with "Invalid operand for global metadata!" I'm starting to think that this is not possible with the current Metadata API (it seems it might have been handled in the past). Any thoughts?

How to get FLOPS in RISC-V using SW or HW method?

I am a newbie to RISC-V. I wonder how I could get FLOPS using SW or HW method. I try to use CSR to get FLOPS, but there are some problems.
As I know, if I redesign the hpmcounter which counts every floating operation event, I could get FLOPS by using the csr read instruction. I know there is a similar design in the rocket-chip-based SiFive's U54-core manual. In the manual I can see SiFive core has sophisticated feature counting capabilities. This feature is controlled by the mhpmevent CSR. If I set lower eight bits of mhpmevent as 0, and enable the [19-25] bit, I can get counter value from mhpmcounter. I actually want to design this field like SiFive core.
I try to imitate it for FLOPS, but I encounter some problems.
I can't access to the mhpmcounter, and I can see the illegal instruction error like following link.
illegal instruction error message!!
I make a simple test code and compile it successfully, but there is a illegal instruction error when I implement it using spike and cycle accurate emulator. Both use proxy kernel.
// simple test code
unsigned long instret1 = 0;
unsigned long instret2 = 0;
float a,b,c;
a = 5.0;
b = 4.0;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret1));
c = a + b;
asm volatile ("csrrs %0, mhpmcounter3, x0 " : "=r"(instret2));
printf("instruction count : %ul \n", instret2-instret1);
It is hard to change to M-mode from user mode for access to the mhpmevet and mhpmcounter. In the RISC-V priv-spec 1.10, I find xRET instruction can change mode. Following text is about xRET in the spec.
The MRET, SRET, or URET instructions are used to return from traps in M-mode, S-mode, or
U-mode respectively. When executing an xRET instruction, supposing xPP holds the value y, x IE
is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if
user-mode is not supported).
If someone knows it, I hope to see the detailed assembly code.
I try to modify rocket-chip/src/main/scala/rocket/CSR.scala for redesign CSR. Is it the only way? Firstly, I want to use spike to test the counter value. How should I change the code?
If anybody has some other ideas or has accomplished it, please point to me. Thanks!

How to pass several previous states using scan in Tensorflow.

I'm going to modify DRAW(Deep Recurrent Attentive Writer) code that other person shared here for variable length sequence using tf.scan function. So I need to change the for loop in the original code into a structure that is suitable for scan function. Below is original part of the code,
...
for t in range(T):
c_prev = tf.zeros((batch_size,img_size)) if t==0 else cs[t-1]
x_hat=x-tf.sigmoid(c_prev) # error image
r=read(x,x_hat,h_dec_prev)
h_enc,enc_state=encode(enc_state,tf.concat(1,[r,h_dec_prev]))
z,mus[t],logsigmas[t],sigmas[t]=sampleQ(h_enc)
h_dec,dec_state=decode(dec_state,z)
cs[t]=c_prev+write(h_dec) # store results
h_dec_prev=h_dec
DO_SHARE=True # from now on, share variables
...
In order to use tf.scan, I need to pass several previous states(c_prev, h_dec_prev...). However, as I know tf.scan only gets one tensor (is it right?) for the loop as an example in here
elems = np.array([1, 2, 3, 4, 5, 6])
sum = scan(lambda a, x: a + x, elems)
It seems there should be only one a and it should be a tensor. In this case, only possible way I can imagine is to flatten several different state tensors and concatenate it. But I'm worrying that it will mess up the code and make slow down the speed a lot especially when the state sizes are all different. Is there any efficient (and fast) way to handle this kind of problem?

How to change configuration of network during simulation in OMNeT++?

I want to modify some parameters of element's .ini file in OMNeT++, say a node's transmission rate, during the simulation run, e.g. when a node receives some control message.
I found information saying that it's possible to somehow loop the configuration stated as: some_variable = ${several values}, but there are no conditional clauses in .ini files and no way to pass to those files any data from C++ functions (as far as I'm concerned).
I use INET, but maybe some other models' users already bothered with such a problem.
I found information saying that it's possible to somehow loop the configuration stated as: some_variable = ${several values}, but there are no conditional clauses in .ini files and no way to pass to those files any data from C++ functions (as far as I'm concerned).
In fact you can use the built-in constraint expression in the INI file. This will allow you to create runs for the given configuration while respecting the specified constraint (condition).
However, this constraint will only apply to the parameters that are specified in the .ini file, i.e. this won't help you if the variable which you are trying to change is computed dynamically as part of the code
Below, I give you a rather complicated "code-snippet" from the .ini file which uses many of the built-in functions that you have mentioned (variable iteration, conditionals etc.)
# Parameter assignment using iteration loops and constrains #
# first define the static values on which the others depend #
scenario.node[*].application.ADVlowerBound = ${t0= 0.1}s
scenario.node[*].application.aggToServerUpperBound = ${t3= 0.9}s
#
## assign values to "dependent" parameters using variable names and loop iterations #
scenario.node[*].application.ADVupperBound = ${t1= ${t0}..${t3} step 0.1}s # ADVupperBound == t1; t1 will take values starting from t0 to t3 (0.1 - 0.9) iterating 0.1
scenario.node[*].application.CMtoCHupperBound = ${t2= ${t0}..${t3} step 0.1}s
#
## connect "dependent" parameters to their "copies" -- this part of the snippet is only variable assignment.
scenario.node[*].application.CMtoCHlowerBound = ${t11 = ${t1}}s
scenario.node[*].application.joinToServerLowerBound = ${t12 = ${t1}}s
#
scenario.node[*].application.aggToServerLowerBound = ${t21 = ${t2}}s
scenario.node[*].application.joinToServerUpperBound = ${t22 = ${t2}}s
#
constraint = ($t0) < ($t1) && ($t1) < ($t2) && ($t2) < ($t3)
# END END END #
The code above creates all the possible combinations of time values for t0 to t3, where they can take values between 0.1 and 0.9.
t0 and t3 are the beginning and the end points, respectively. t1 and t2 take values based on them.
t1 will take values between t0 and t3 each time being incremented by 0.1 (see the syntax above). The same is true for t2 too.
However, I want t0 to always be smaller than t1, t1 smaller than t2, and t2 smaller than t3. I specify these conditions in the constraint section.
I am sure, a thorough read through this section of the manual, will help you find the solution.
If you want to change some value during the simulation you can just do that in your C++ code. Something like:
handleMessage(cMessage *msg){
if(msg->getKind() == yourKind){ // replace yourKind with the one you are using for these messages
transmission_rate = new_value;
}
What you are refering to as some_variable = ${several values} can be used to perform multiple runs with different parameters. For example one run with a rate of 1s, one with 2s and one with 10s. That would then be:
transsmission_rate = ${1, 2, 10}s
For more detailed information how to use such values (like to do loops) see the relevant section in the OMNeT++ User Manual
While you can certainly manually change volatile parameters, OMNeT++ (as far as I am aware) offers no integrated support for automatic changing of parameters at runtime.
You can, however, write some model code that changes volatile parameters programatically.

llvm alloca dependencies

I am trying to determine for certain Load instructions from my pass their corresponding Alloca instructions (that can be in other previous blocks). The chain can be something like : TargetLoad(var) -> other stores/loads that use var (or dependencies on var) -> alloca(var). , linked on several basic blocks. Do you know how can I do it?
I tried to use the methods from DependenceAnalysis and MemoryDependenceAnalysis, but the results were not correct. For instance, MemoryDependenceAnalysis::getDependency should be good with option "Def", but works only for stores, not for loads. Also I have a segfault when trying to use MemoryDependenceAnalysis::getNonLocalPointerDependency or MemoryDependenceAnalysis::getPointerDependencyFrom . When I try to check my result using MemDepResult::getDef(), the result for Load instructions is the same instruction ! So its depending on itself, that being weird since it is using a variable that is previously defined in the code.
The alternative of making the intersection for identifying common parts between all the variables used by target_load_instructions and all the allocated variables is not an option. Because there might be something like : alloca(a) ... c=a*b+4 .... load(c).
It seems also that DependenceAnalysis::depends() is not ok for my pass. The next line of code is only for reference: if(DA.depends(allocaInstrArray[i],loadInstrArray[j],true)) is always false. And it should be true in several cases. I think I am not using it correctly.
However, I made the assumption that maybe depends() does not work for Alloca. So I checked the dependencies among all Load instructions kept in an array. Some results are not based on the loaded variable as they should. For example: LOAD %3 = load i32* %c, align 4 IS DEPENDENT ON %1 = load i32* %j, align 4. As you can see, one is loading c and one is loading j. In my Test.cpp target code there is no dependence between j and c. Maybe the dependence is not based on variables/memory locations used?
Thank you for any suggestion !
First, use getOperand(0) or getOperand(1) of the ICMP instructions. If there is isa<LoadInst> valid, then cast them to LoadInst. getPointerOperand() will get the Value* that is the actual variable which is searched.
Second, do the same procedure between Load instructions and Alloca instructions. getOperand(0) applied on Load gives the corresponding Alloca instruction.
Finally, link the two results together, by checking the dependencies.