Generating branch instructions from an AST for a conditional expression - if-statement

I am trying to write a compiler for a domain-specific language, targeting a stack-machine based VM that is NOT a JVM.
I have already generated a parser for my language, and can readily produce an AST which I can easily walk. I also have had no problem converting many of the statements of my language into the appropriate instructions for this VM, but am facing an obstacle when it comes to the matter of handling the generation of appropriate branching instructions when complex conditionals are encountered, especially when they are combined with (possibly nested) 'and'-like or 'or' like operations which should use short-circuiting branching as applicable.
I am not asking anyone to write this for me. I know that I have not begun to describe my problem in sufficient detail for that. What I am asking for is pointers to useful material that can get me past this hurdle I am facing. As I said, I am already past the point of converting about 90% of the statements in my language into applicable instructions, but it is the handling of conditionals and generating the appropriate flow control instructions that has got me stumped. Much of the info that I have been able to find so far on generating code from an AST only seems to deal with the generation of code corresponding to simple imperative-like statements, but the handing of conditionals and flow control appears to be much more scarce.
Other than the short-circuiting/lazy-evaluation mechanism for 'and' and 'or' like constructs that I have described, I am not concerned with handling any other optimizations.

Every conditional control flow can be modelled as a flow chart (or flow graph) in which the two branches of the conditional have different targets. Given that boolean operators short-circuit, they are control flow elements rather than simple expressions, and they need to be modelled as such.
One way to think about this is to rephrase boolean operators as instances of the ternary conditional operator. So, for example, A and B becomes A ? B : false and A or B becomes A ? true : B [Note 1]. Note that every control flow diagram has precisely two output points.
To combine boolean expressions, just substitute into the diagram. For example, here's A AND (B OR C)
You implement NOT by simply swapping the meaning of the two out-flows.
If the eventual use of the boolean expression is some kind of conditional, such as an if statement or a conditional loop, you can use the control flow as is. If the boolean expression is to be saved into a variable or otherwise used as a value, you need to fill in the two outflows with code to create the relevant constant, usually a true or false boolean constant, or (in C-like languages) a 1 or 0.
Notes:
Another way to write this equivalence is A and B ⇒ A ? B : A; A or B ⇒ A ? A : B, but that is less useful for a control flow view, and also clouds the fact that the intent is to only evaluate each expression once. This form (modified to reuse the initial computation of A) is commonly used in languages with multiple "falsey" values (like Python).

Related

Should branches of code caused by short-circuiting be considered for code coverage?

The regular branch coverage will require two unit tests to cover a simple if statement. But if there is combound condition like if (A && B), from the control flow graph perspective, there is an additional branch if short-circuiting is used. This is in concert with cyclomatic complexity count, which gives 3 (also applying the rules that each logical operators increases the complexity by 1, because a decision node is created in case of short-circuiting). But as far as I know, the code analyzer do not consider those branches.
Is it worth covering them anyway to make sure no side-effects result from partial evaluation of the expression?
It depends on your purpose in analysing the code, naturally.
FAA generally recommends (for example DOT/FAA/AR-06/54 "Software Verification Tools Assessment Study", Final Report, June 2007. Section 4.2.5) that all operands to a short-circuited operator (including C's ternary operator, as well as the boolean operators) be interpreted as decisions - like you describe. For higher design assurance levels (particularly catastrophic and hazardous), the objectives to be satisfied under relevant standards (DO178, DO254, etc) have an effect of requiring coverage of all possible decisions - with increasing independence at higher DALs.
So, assuming your application requires higher DAL, the answer would normally be yes. The alternative would be to construct a specific argument to support a claim that such coverage is not needed to meet objectives of your analysis or testing - and convince reviewers to accept that argument. Such an argument might need to be constructed for every instance of short circuiting.
You can argue about this in various ways, but for me, the fact that:
if (a && b) { X=... }
is exactly equivalent (often defined as):
if (a)
{ if (b) { X=... }
}
means your answer to coverage for the && operator should be identical to the answer for the definitional equivalent. Similarly for the || operator.

Direct access to Theory decision procedures using z3

Is there a z3 c++ api for direct query of a theory decision procedure?
Meaning, given a set of theory predicates, I would like to check whether they are conflicting in some given theory, without calling the z3 prover on their conjunction.
For example, I would like to check whether the following set of predicates in equality logic are conflicting:
x=y, y=z, x!=z
You can probably use some tactics, depending on what parts of the theories exactly you need (see Z3 Strategies).
If you want only very quick no-solving-at-all checks, then you should use the simplifier, it will apply the rewrite rules for all theories and return a simplified expression, which could be either true or false. The rewriter is also used to evaluate expressions given some model, so it supports everything that is needed to evaluate an expression when all the variable/constants/functions have an interpretation (and model completion can be enabled to fill in the missing ones).

What are some alternative forms of if > then relationships?

Traditional if > then relationship in pseudo code:
if (x>y) {
then print "x is greater than y."
}
There are also relational databases.
Or just visual if>then tables. A visual table representation.
There are also tree or hierarchical structure if>then programming aids.
I'm looking for any and all alternatives and flavors of if>then constructs, but preferably practical ones. Since most humans are better at using and remembering visual constructs (tables vs raw code) than symbolic constructs, I'm looking for the most intuitive way to theoretically construct an if>then rule engine, graphically.
Note: I'm not trying to implement this, I'm just trying to get an idea of what could theoretically be done.
I hope I've interpreted the question correctly.
Everything eventually boils down to comparisons, its just a matter of breaking up these comparisons in manageable chunks for humans. There are many techniques to reduce if-thens, or at least transform them into something easier to understand.
One example would be polymorphism. This frees the programmer from one instance of if/then (basically a switch statement). Another example is maps. The implementation of a map uses if/thens, but one might pre-populate the map with all the data and use one logical piece of code instead of using if/then to differentiate. This moves to a data-driven approach. Another example is SQL; it is just a language, a higher level construct, that enables us to express conditions and constraints differently. How you choose to express these conditions is dependent on the problem domain. Some problems work well with traditional procedural programming, some with logic programming, declarative programming etc. If there are many levels of nested if-thens, a state machine approach might work well. Aspect-oriented programming tries to solve the problem of duplicated code in modules that doesn't belong specifically to any one module; a concern that "cross-cuts".
I would do some reading on Programming Paradigms. Do lots of research and if you run into a recurring problem, see if another approach allows you to reduce the amount of if-thens. Most times someone else has run into the same problem and come up with a solution.
Your question is a bit broad and we could ramble from logical gates to mathematical functions. I'm going to focus on this particular bit:
"I'm looking for the most intuitive way to theoretically construct an if>then rule engine, graphically".
First, two caveats:
The best representation depends on the number of possible rules. What works for 3-4 rules probably won't work for 30-40.
I'm going to pretend that else conditions don't exist.
If "X then Y" boils down to: one condition and one instruction whose execution depends on the condition. Let's pretend X -> Y means that "If X is true then Y is executed". Let's create two sets: one is C that contains all the possible conditions. The other one is I which contains all the possible instructions.
With this is mind, X ∈ C and Y ∈ I. In your specific case, can Y ∈ C (can Y be a condition)? If so, you have nested ifs.
Nested ifs can be represented as chains of conditions joined by and operators:
if (x > 3) {
if (y > 5) {
# do something
}
}
Can be written as:
if (x > 3 and y > 5) {
# do something
}
If you're only thinking about code then the latter can become problematic when you have many nested conditions, but when you go graphical, nesting (probably using tree-like structures) can look cluttered while chaining usually looks like a sequence of instructions (which I think is better).
If you don't consider nesting (chaining) in your rules, then connecting elements (boxes, circles, etc) from X -> Y is trivial way to work. The representation of this depends on how graphical you want to get (see the links below for some examples).
If you're considering nesting then three random ideas come to my mind:
Venn Diagrams: Visually attractive, useless for more than 3-4 conditions. They have a good fit with database representations. See: http://share.mheroin.com/image/3i3l1y0S2F39
Flowcharts: Highly functional and easy to read, not too cumbersome to create. Can get out of hand with 10+ elements. See: http://share.mheroin.com/image/2g071j3U1u29
Tables: As you mentioned, tables are a decent way to represent conditionals as long as you can restrain the set of applicable rules. This is an example taken from iTunes: http://share.mheroin.com/image/390y2G18123q. The "Match [all/any] of the following rules" works as a replacement for if/else.

How to simplify a C-style arithmetical expression containing variables during code generation?

I am trying to optimize expression evaluation in a compiler.
The arithmetical expressions are all C-style, and they may contain variables. I hope to simplify the expressions as much as possible.
For example, (3+100*A*B+100)*3+100 may be simplified to 409+300*A*B.
It mainly depends on the distributive law, the associative law and the commutative law.
The main difficulty I encounter is how to combine these arithmetical laws and traditional stack-scan evaluating algorithms.
Can anyone share experiences related to this or similar problems in the context of compiler building?
Compilers usually have some internal normalization rules like "constants to the left". This means a + 3 would be transformed into 3 + a, but not vice versa.
In your example,
(3+100*A*B+100)*3+100 would be normalized into
(3+100+(100*A*B))*3+100.
Now it is clear to optimize 3+100.
Another transformation might be a*C1+C2 into (a+(C2/C1))*C1 under the condition that C1 and C2 are constants. Intuitively, this normalizes "add before multiply".
Those normalizations are not optimizations. The intention is mostly to group constants together, so constant folding is more effective.
Apply constant folding combined with strength reduction during the code generation pass of the compilation. Most compiler texts will provide an algorithm to implement this.

calculating user defined formulas (with c++)

We would like to have user defined formulas in our c++ program.
e.g. The value v = x + ( y - (z - 2)) / 2. Later in the program the user would define x,y and z -> the program should return the result of the calculation. Somewhen later the formula may get changed, so the next time the program should parse the formula and add the new values. Any ideas / hints how to do something like this ? So far I just came to the solution to write a parser to calculate these formulas - maybe any ideas about that ?
If it will be used frequently and if it will be extended in the future, I would almost recommend adding either Python or Lua into your code. Lua is a very lightweight scripting language which you can hook into and provide new functions, operators etc. If you want to do more robust and complicated things, use Python instead.
You can represent your formula as a tree of operations and sub-expressions. You may want to define types or constants for Operation types and Variables.
You can then easily enough write a method that recurses through the tree, applying the appropriate operations to whatever values you pass in.
Building your own parser for this should be a straight-forward operation:
) convert the equation from infix to postfix notation (a typical compsci assignment) (I'd use a stack)
) wait to get the values you want
) pop the stack of infix items, dropping the value for the variable in where needed
) display results
Using Spirit (for example) to parse (and the 'semantic actions' it provides to construct an expression tree that you can then manipulate, e.g., evaluate) seems like quite a simple solution. You can find a grammar for arithmetic expressions there for example, if needed... (it's quite simple to come up with your own).
Note: Spirit is very simple to learn, and quite adapted for such tasks.
There's generally two ways of doing it, with three possible implementations:
as you've touched on yourself, a library to evaluate formulas
compiling the formula into code
The second option here is usually done either by compiling something that can be loaded in as a kind of plugin, or it can be compiled into a separate program that is then invoked and produces the necessary output.
For C++ I would guess that a library for evaluation would probably exist somewhere so that's where I would start.
If you want to write your own, search for "formal automata" and/or "finite state machine grammar"
In general what you will do is parse the string, pushing characters on a stack as you go. Then start popping the characters off and perform tasks based on what is popped. It's easier to code if you force equations to reverse-polish notation.
To make your life easier, I think getting this kind of input is best done through a GUI where users are restricted in what they can type in.
If you plan on doing it from the command line (that is the impression I get from your post), then you should probably define a strict set of allowable inputs (e.g. only single letter variables, no whitespace, and only certain mathematical symbols: ()+-*/ etc.).
Then, you will need to:
Read in the input char array
Parse it in order to build up a list of variables and actions
Carry out those actions - in BOMDAS order
With ANTLR you can create a parser/compiler that will interpret the user input, then execute the calculations using the Visitor pattern. A good example is here, but it is in C#. You should be able to adapt it quickly to your needs and remain using C++ as your development platform.