How to determine if a BasicBlock is controled by a `if` - llvm

I want to use LLVM to analyze if a basic block is affected by a control flow of an if(i.e., br instruction). "A basic block BB is NOT affected by br" means that no matter which of two blocks the br goes to BB will be executed for sure. I use an example to briefly show what I want:
My current rule to determine if a basic block BB is affected is (if true, affected.)
¬(postDominate(BB, BranchInst->leftBB) && postDominate(BB, BranchInst->rightBB))
Since I can not exhaustively test the rule above to all possible CFGs, I want to know if this rule is sound and complete.
Thanks!
Further question
I am also confused if I should use dominate rather than postDominate like (I know the difference between post-dominance and dominance, but which should I use? Both rules seems work in this example, but I am not sure which one will/won't work in other cases):
Dominate(BranchInst->leftBB, BB) || Dominate(BranchInst->rightBB, BB)

A block Y is control dependent on block X if and only if Y postdominates at least one but not all successors of X.
llvm::ReverseIDFCalculator in llvm/Analysis/IteratedDominanceFrontier.h will calculate the post-dominance frontier for you, which is exactly what you need. The "iterated" part isn't relevant to your use case, ignore the setLiveInBlocks() method.
I have not tested this at all, but I expect that something like this should do the trick:
// PDT is the llvm::PostDominatorTree
SmallPtrSet<BasicBlock *, 1> BBSet{block_with_branch};
SmallVector<BasicBlock *, 32> IDFBlocks;
ReverseIDFCalculator IDFs(PDT);
IDFs.setDefiningBlocks(BBSet);
IDFs.calculate(IDFBlocks);

The relation control dependent is transitive. Applying the definition to all the control-dependent-impacted block(s) iteratively is the right way.

Related

Maxima: creating a function that acts on parts of a string

Context: I'm using Maxima on a platform that also uses KaTeX. For various reasons related to content management, this means that we are regularly using Maxima functions to generate the necessary KaTeX commands.
I'm currently trying to develop a group of functions that will facilitate generating different sets of strings corresponding to KaTeX commands for various symbols related to vectors.
Problem
I have written the following function makeKatexVector(x), which takes a string, list or list-of-lists and returns the same type of object, with each string wrapped in \vec{} (i.e. makeKatexVector(string) returns \vec{string} and makeKatexVector(["a","b"]) returns ["\vec{a}", "\vec{b}"] etc).
/* Flexible Make KaTeX Vector Version of List Items */
makeKatexVector(x):= block([ placeHolderList : x ],
if stringp(x) /* Special Handling if x is Just a String */
then placeHolderList : concat("\vec{", x, "}")
else if listp(x[1]) /* check to see if it is a list of lists */
then for j:1 thru length(x)
do placeHolderList[j] : makelist(concat("\vec{", k ,"}"), k, x[j] )
else if listp(x) /* check to see if it is just a list */
then placeHolderList : makelist(concat("\vec{", k, "}"), k, x)
else placeHolderList : "makeKatexVector error: not a list-of-lists, a list or a string",
return(placeHolderList));
Although I have my doubts about the efficiency or elegance of the above code, it seems to return the desired expressions; however, I would like to modify this function so that it can distinguish between single- and multi-character strings.
In particular, I'd like multi-character strings like x_1 to be returned as \vec{x}_1 and not \vec{x_1}.
In fact, I'd simply like to modify the above code so that \vec{} is wrapped around the first character of the string, regardless of how many characters there may be.
My Attempt
I was ready to tackle this with brute force (e.g. transcribing each character of a string into a list and then reassembling); however, the real programmer on the project suggested I look into "Regular Expressions". After exploring that endless rabbit hole, I found the command regex_subst; however, I can't find any Maxima documentation for it, and am struggling to reproduce the examples in the related documentation here.
Once I can work out the appropriate regex to use, I intend to implement this in the above code using an if statement, such as:
if slength(x) >1
then {regex command}
else {regular treatment}
If anyone knows of helpful resources on any of these fronts, I'd greatly appreciate any pointers at all.
Looks like you got the regex approach working, that's great. My advice about handling subscripted expressions in TeX, however, is to avoid working with names which contain underscores in Maxima, and instead work with Maxima expressions with indices, e.g. foo[k] instead of foo_k. While writing foo_k is a minor convenience in Maxima, you'll run into problems pretty quickly, and in order to straighten it out you might end up piling one complication on another.
E.g. Maxima doesn't know there's any relation between foo, foo_1, and foo_k -- those have no more in common than foo, abc, and xyz. What if there are 2 indices? foo_j_k will become something like foo_{j_k} by the preceding approach -- what if you want foo_{j, k} instead? (Incidentally the two are foo[j[k]] and foo[j, k] when represented by subscripts.) Another problematic expression is something like foo_bar_baz. Does that mean foo_bar[baz], foo[bar_baz] or foo_bar_baz?
The code for tex(x_y) yielding x_y in TeX is pretty old, so it's unlikely to go away, but over the years I've come to increasing feel like it should be avoided. However, the last time it came up and I proposed disabling that, there were enough people who supported it that we ended up keeping it.
Something that might be helpful, there is a function texput which allows you to specify how a symbol should appear in TeX output. For example:
(%i1) texput (v, "\\vec{v}");
(%o1) "\vec{v}"
(%i2) tex ([v, v[1], v[k], v[j[k]], v[j, k]]);
$$\left[ \vec{v} , \vec{v}_{1} , \vec{v}_{k} , \vec{v}_{j_{k}} ,
\vec{v}_{j,k} \right] $$
(%o2) false
texput can modify various aspects of TeX output; you can take a look at the documentation (see ? texput).
While I didn't expect that I'd work this out on my own, after several hours, I made some progress, so figured I'd share here, in case anyone else may benefit from the time I put in.
to load the regex in wxMaxima, at least on the MacOS version, simply type load("sregex");. I didn't have this loaded, and was trying to work through our custom platform, which cost me several hours.
take note that many of the arguments in the linked documentation by Dorai Sitaram occur in the reverse, or a different order than they do in their corresponding Maxima versions.
not all the "pregexp" functions exist in Maxima;
In addition to this, escaping special characters varied in important ways between wxMaxima, the inline Maxima compiler (running within Ace editor) and the actual rendered version on our platform; in particular, the inline compiler often returned false for expressions that compiled properly in wxMaxima and on the platform. Because I didn't have sregex loaded on wxMaxima from the beginning, I lost a lot of time to this.
Finally, the regex expression that achieved the desired substitution, in my case, was:
regex_subst("\vec{\\1}", "([[:alpha:]])", "v_1");
which returns vec{v}_1 in wxMaxima (N.B. none of my attempts to get wxMaxima to return \vec{v}_1 were successful; escaping the backslash just does not seem to work; fortunately, the usual escaped version \\vec{\\1} does return the desired form).
I have yet to adjust the code for the rest of the function, but I doubt that will be of use to anyone else, and wanted to be sure to post an update here, before anyone else took time to assist me.
Always interested in better methods / practices or any other pointers / feedback.

If... else condition in LUA language

I'm new to coding and I recently started to take my baby steps in LUA. I have a small problem so it would be very helpful if you can help me. In my code, I need to code that
If x ~= 1 and x~=2 and x~=3 and x~=4 then (do something) end
is there a faster way not to hardcode that part, not to type the whole thing from x~=1 to x~=4?
Thank you!
If you need something like if x ~= 1 and x~=2 and x~=3 and x~=4 then (do something) end x is usually an integer.
Then
if x < 1 or x > 4 then
-- do your stuff here
end
Is what you are looking for. If you want to explicitly check wether x is unquald 1,2,3,4 you can simply do something like Egor suggested.
But as you see unless you can describe your conditions in a shorter mathematical way you still have separate unique conditions and you won't come around writing them down.
If you have to check those conditions repeatedly you can use a truth table like in Egor's example or you write a function that returns if that condition is met for its argument.

Generating branch instructions from an AST for a conditional expression

I am trying to write a compiler for a domain-specific language, targeting a stack-machine based VM that is NOT a JVM.
I have already generated a parser for my language, and can readily produce an AST which I can easily walk. I also have had no problem converting many of the statements of my language into the appropriate instructions for this VM, but am facing an obstacle when it comes to the matter of handling the generation of appropriate branching instructions when complex conditionals are encountered, especially when they are combined with (possibly nested) 'and'-like or 'or' like operations which should use short-circuiting branching as applicable.
I am not asking anyone to write this for me. I know that I have not begun to describe my problem in sufficient detail for that. What I am asking for is pointers to useful material that can get me past this hurdle I am facing. As I said, I am already past the point of converting about 90% of the statements in my language into applicable instructions, but it is the handling of conditionals and generating the appropriate flow control instructions that has got me stumped. Much of the info that I have been able to find so far on generating code from an AST only seems to deal with the generation of code corresponding to simple imperative-like statements, but the handing of conditionals and flow control appears to be much more scarce.
Other than the short-circuiting/lazy-evaluation mechanism for 'and' and 'or' like constructs that I have described, I am not concerned with handling any other optimizations.
Every conditional control flow can be modelled as a flow chart (or flow graph) in which the two branches of the conditional have different targets. Given that boolean operators short-circuit, they are control flow elements rather than simple expressions, and they need to be modelled as such.
One way to think about this is to rephrase boolean operators as instances of the ternary conditional operator. So, for example, A and B becomes A ? B : false and A or B becomes A ? true : B [Note 1]. Note that every control flow diagram has precisely two output points.
To combine boolean expressions, just substitute into the diagram. For example, here's A AND (B OR C)
You implement NOT by simply swapping the meaning of the two out-flows.
If the eventual use of the boolean expression is some kind of conditional, such as an if statement or a conditional loop, you can use the control flow as is. If the boolean expression is to be saved into a variable or otherwise used as a value, you need to fill in the two outflows with code to create the relevant constant, usually a true or false boolean constant, or (in C-like languages) a 1 or 0.
Notes:
Another way to write this equivalence is A and B ⇒ A ? B : A; A or B ⇒ A ? A : B, but that is less useful for a control flow view, and also clouds the fact that the intent is to only evaluate each expression once. This form (modified to reuse the initial computation of A) is commonly used in languages with multiple "falsey" values (like Python).

Halide 'select' without evaluating both arguments

As you can know if you tried Halide select(x,y,z); is something similar to the ternary operator on C++ where x is the conditional y if true and z if false.
Imagine that y is just return 0 and z is a really costly function, it could have sense to skip evaluating z where x is false, unfortunatly Halide evaluates both terms even if I set select(x,likely(y),z); or at least it happens if I use compile_to_file (.h + .lib)
Any idea about this?
Thank you!
The effect of the likely intrinsic is limited to loop peeling, not to anywhere a select might be used. That is, this only has an effect if the condition is closely related to the coordinates of the function definition in which the select appears (as in boundary conditions on an image, where the select is predicated on the x and y coordinates of the function). It does not turn arbitrary select expressions into full branching if/else statements.
You can see some examples in the tests for the intrinsic.
If you share an actual running piece of code it will be easier to discuss why loop peeling and the likely intrinsic do or don't apply in your particular case.

How can I avoid using the stack with continuation-passing style?

For my diploma thesis I chose to implement the task of the ICFP 2004 contest.
The task--as I translated it to myself--is to write a compiler which translates a high-level ant-language into a low-level ant-assembly. In my case this means using a DSL written in Clojure (a Lisp dialect) as the high-level ant-language to produce ant-assembly.
UPDATE:
The ant-assembly has several restrictions: there are no assembly-instructions for calling functions (that is, I can't write CALL function1, param1), nor returning from functions, nor pushing return addresses onto a stack. Also, there is no stack at all (for passing parameters), nor any heap, or any kind of memory. The only thing I have is a GOTO/JUMP instruction.
Actually, the ant-assembly is for to describe the transitions of a state machine (=the ants' "brain"). For "function calls" (=state transitions) all I have is a JUMP/GOTO.
While not having anything like a stack, heap or a proper CALL instruction, I still would like to be able to call functions in the ant-assembly (by JUMPing to certain labels).
At several places I read that transforming my Clojure DSL function calls into continuation-passing style (CPS) I can avoid using the stack[1], and I can translate my ant-assembly function calls into plain JUMPs (or GOTOs). Which is exactly what I need, because in the ant-assembly I have no stack at all, only a GOTO instruction.
My problem is that after an ant-assembly function has finished, I have no way to tell the interpreter (which interprets the ant-assembly instructions) where to continue. Maybe an example helps:
The high-level Clojure DSL:
(defn search-for-food [cont]
(sense-food-here? ; a conditional w/ 2 branches
(pickup-food ; true branch, food was found
(go-home ; ***
(drop-food
(search-for-food cont))))
(move ; false branch, continue searching
(search-for-food cont))))
(defn run-away-from-enemy [cont]
(sense-enemy-here? ; a conditional w/ 2 branches
(go-home ; ***
(call-help-from-others cont))
(search-for-food cont)))
(defn go-home [cont]
(turn-backwards
; don't bother that this "while" is not in CPS now
(while (not (sense-home-here?))
(move)))
(cont))
The ant-assembly I'd like to produce from the go-home function is:
FUNCTION-GO-HOME:
turn left nextline
turn left nextline
turn left nextline ; now we turned backwards
SENSE-HOME:
sense here home WE-ARE-AT-HOME CONTINUE-MOVING
CONTINUE-MOVING:
move SENSE-HOME
WE-ARE-AT-HOME:
JUMP ???
FUNCTION-DROP-FOOD:
...
FUNCTION-CALL-HELP-FROM-OTHERS:
...
The syntax for the ant-asm instructions above:
turn direction which-line-to-jump
sense direction what jump-if-true jump-if-false
move which-line-to-jump
My problem is that I fail to find out what to write to the last line in the assembly (JUMP ???). Because--as you can see in the example--go-home can be invoked with two different continuations:
(go-home
(drop-food))
and
(go-home
(call-help-from-others))
After go-home has finished I'd like to call either drop-food or call-help-from-others. In assembly: after I arrived at home (=the WE-ARE-AT-HOME label) I'd like to jump either to the label FUNCTION-DROP-FOOD or to the FUNCTION-CALL-HELP-FROM-OTHERS.
How could I do that without a stack, without PUSHing the address of the next instruction (=FUNCTION-DROP-FOOD / FUNCTION-CALL-HELP-FROM-OTHERS) to the stack? My problem is that I don't understand how continuation-passing style (=no stack, only a GOTO/JUMP) could help me solving this problem.
(I can try to explain this again if the things above are incomprehensible.)
And huge thanks in advance for your help!
--
[1] "interpreting it requires no control stack or other unbounded temporary storage". Steele: Rabbit: a compiler for Scheme.
Yes, you've provided the precise motivation for continuation-passing style.
It looks like you've partially translated your code into continuation-passing-style, but not completely.
I would advise you to take a look at PLAI, but I can show you a bit of how your function would be transformed, assuming I can guess at clojure syntax, and mix in scheme's lambda.
(defn search-for-food [cont]
(sense-food-here? ; a conditional w/ 2 branches
(search-for-food
(lambda (r)
(drop-food r
(lambda (s)
(go-home s cont)))))
(search-for-food
(lambda (r)
(move r cont)))))
I'm a bit confused by the fact that you're searching for food whether or not you sense food here, and I find myself suspicious that either this is weird half-translated code, or just doesn't mean exactly what you think it means.
Hope this helps!
And really: go take a look at PLAI. The CPS transform is covered in good detail there, though there's a bunch of stuff for you to read first.
Your ant assembly language is not even Turing-complete. You said it has no memory, so how are you supposed to allocate the environments for your function calls? You can at most get it to accept regular languages and simulate finite automata: anything more complex requires memory. To be Turing-complete you'll need what amounts to a garbage-collected heap. To do everything you need to do to evaluate CPS terms you'll also need an indirect GOTO primitive. Function calls in CPS are basically (possibly indirect) GOTOs that provide parameter passing, and the parameters you pass require memory.
Clearly, your two basic options are to inline everything, with no "external" procedures (for extra credit look up the original meaning of "internal" and "external" here), or somehow "remember" where you need to go on "return" from a procedure "call" (where the return point does not necessarily need to fall in the physical locations immediately following the "calling" point). Basically, the return point identifier can be a code address, an index into a branch table, or even a character symbol -- it just needs to identify the return target relative to the called procedure.
The most obvious here would be to track, in your compiler, all of the return targets for a given call target, then, at the end of the called procedure, build a branch table (or branch ladder) to select from one of the several possible return targets. (In most cases there are only a handful of possible return targets, though for commonly used procedures there could be hundreds or thousands.) Then, at the call point, the caller needs to load a parameter with the index of its return point relative to the called procedure.
Obviously, if the callee in turn calls another procedure, the first return point identifier must be preserved somehow.
Continuation passing is, after all, just a more generalized form of a return address.
You might be interested in Andrew Appel's book Compiling with Continuations.