Decision Diagram for Multiple Output Boolean Function in CUDD - binary-decision-diagram

I know that CUDD has support for ADDs (Algebraic Decision Diagrams) but I can't seem to figure out how I can use ADDs for multiple output boolean functions. The ADD for such functions would need to have multiple leaves, with each leaf representing a boolean output vector (maybe encoded as an integer). For example, 0101 as 5, 1000 as 8 and so on. Does anyone know if such DDs can be built in CUDD?

Related

What is the name of the data structure for and-or-lists (or and-or-trees) and where can I read about it?

I recently needed to make a data structure which was a nested list of and/or questions. Since most every interesting thing has been discovered by someone else previously, I’m looking for the name of this data structure. it looks something like this.
‘((a b c) (b d e) (c (a b) (f a)))
The interpretation is I want to find abc or bde or caf or caa or cbf or cba and the list encapsulates that. At the top level each item is or’ed together and sub-lists of the top level are and’ed together and sub-lists of sub-lists are or’ed again sub-lists of those are and’ed and sub-lists of those or’ed ad infinitum. Note that in my example, all the lists are the same length, in my real application the lists vary in length.
The code to walk such a “tree” is relatively simple, but I’m assuming that there is a name for that type of tree and there is stuff I can read about it.
These lists are equivalent to fixed length regular expressions (which I've seen referred to as "network expressions", but I am particularly interested in this data structure and representation thereof.
In general (in the very high level of abstraction) it is:
Context free grammar -Wiki
If you allow it to be infinitely nested, then it is not a regular expression because of presence of parentheses (left and right should match).
If you consider, that expressions inside parentheses are ordered. I mean that a and b and c is equivalent to (a and b) and c. You get then Binary expression tree -Wiki
But for your particular case, it is probably: Disjunctive normal form -Wiki
I am not sure, but my intuition says that it is regular expression again because you have only 2 levels of nesting (1st - for 'or-ed' and 2nd - for 'and-ed' parts)
The trees are also a subset of DAWGS - directed acyclic word graphs and one could construct them the same way.
In my case, I have a very small set that I have built by hand and I don't worry about getting the minimal set, but instead just want something that I can easily write down but deals with the types of simple variations I see. Basically, I have different ways of finding where I keep my .el files based upon the different directory structures of various OSes I use. (E.g. when I was working at Google, the /usr/local/emacs/site-lisp directory was actually more like /usr/local/Google/emacs/site-lisp.)
I don't need a full regex, but there are about a dozen variations, some having quite long lists of nested sub-directories (c:\users\cfclark\appData\roaming\emacs.emacs.d or some other awful thing) that I wanted to write down (and then have emacs make an automated search to find the one that is appropriate to this particular installation). And every time I go to a new job, I can simply add to the list a description of where they are in that setup.
Anyway, as that code has evolved, I found that I had I was doing (nested or's and and's and realized that the structure generalized to the alternating or/and/or/and/... case). So, my assumption is that someone must have discovered this before. I had hints of it myself several years ago, but didn't set down to implement it. The Disjunctive Normal Form link mpasko256 gave is also particularly relevant. I don't normalize to that level, I still keep nested and's and or's rather than flattening to 2, but I do have a distinct structure, or's at the top, then and's, then or's....

How to do multiple searches within a NSData object?

TL, DR
See this member function of NSData (in Swift here):
func rangeOfData(_ dataToFind: NSData, options mask: NSDataSearchOptions, range searchRange: NSRange) -> NSRange
I want to replace that first parameter with a Set (or other collection) of NSData and return the first match. A data pattern can be a subset of another, so return the longest match. Any ideas?
I've heard that this method uses something like Boyer–Moore in its implementation. So my extension would probably use something like Aho–Corasick.
Too long...
I want to read e-mail files.
I'm planning to read files a line at a time, so I need a line-break detection algorithm. The standard is CRLF, but I should scan for just-CR, just-LF, and (for completion) LFCR. Note I'm reading in binary, where binary-to-text is done at a later stage, so all those NSString and NSRegularExpression stuff won't help me.
I was going to write a custom routine, then I realized I could probably generalize it. I made a parse-tree node class, a skeleton class of the line generator, and a bunch of tests. I have procrastinated on the meat of the code: the input byte scanning, creating parse trees, and traversing parse trees. I'm still procrastinating by looking around here a second time, because this seems like the type of problem our programmer ancestors had to do and already have code for.
This time around, I found out about the string-matching algorithms, so hopefully I'm one step closer (besides translating them to Swift).
Or maybe I should drop generalization and call rangeOfData: options: range: in two passes each cycle, one for CR and once for LF and reconcile them. That's inefficient, though. Or keep the custom list idea; make a tree with a branch for each pattern, unifying prefix overlaps with a branch with multiple nodes. (If searching backwards, make suffix-oriented branches!) Multiple passes here would be even more inefficient.
Note that NSData has a function to traverse its data as a series of blocks. I was going to use that function, but I have to be careful that a pattern could be split across two (or more!) blocks.

Using multidimensional arrays to analyze sequences of RNA

I'm currently learning about multidimensional arrays and was given the task of analyzing strands of RNA sequences (given from a .txt file). Here is an example of a strand:
AUGCUUAUUAACUGAAAACAUAUGGGUAGUCGAUGA
Given this string, I am to figure out what protein this RNA strand would create. In order to do so, I am to break down each strand into codons (groups of 3). So for this exampple, I need to look at AUG CUU AUU AAC UGA, etc. Each of these codons represents an amino acid. So AUG is methionine (represented by 'M'), CUU is leucine (represented by 'L') and so on and so forth. My output should therefore be a new string of amino acids (M-L-I...)
What would be the best way to approach this problem? From my understanding, I'm to create a 3-D array, let's say
int aminoAcid[4][4][4]
Since there are 4 possible choice for each base (A,U,G,C). I'm not entirely sure where to go from here though since certain combinations will give the same amino acid.
EDIT: Am I going in the right direction if a were to first convert the string into number representations (A=0, U=1, G=2, C=3). From there I can work better with a 3d array right?
You can use the 3d array to connect amino acids to different sequences. You should learn about enum and figure out how you can use enum with your array indices so that you can do something like
aminoAcid['A']['U']['G'] = 24
where 24 is also corresponding to methionine, meaning you can use another enum there. Use enums whenever you have a limited known group of items you want to represent with numbers.
It sounds like this is just the beginning of a larger project, so you should follow good practices from the start, thinking about how you can build components that represent your problem.

Two inputs on self loop, deterministic or non-deterministic state machine?

Wikipedia states that a Deterministic State Automation "produces a unique computation (or run) of the automaton for each input string".
I always understood this as there being only 1 possible path to compute any unique string. In which case, the following is a DSM.
But now i'm overthinking this and interpreting the description as each input string having a single possible path, and that path is unique from all other input strings. In which case, the following isn't a DSM as '11' and '12' follow the same paths.
So my question is, is the following a DSM or NDSM?
Its still deterministic, there is only one possible path for each input from each state. 1, and 2 can only go back to itself, for it to be non deterministic, the input should have multiple possible paths. Such as if input 1 had two possible states branching from one specific state.
In short, if there are not branching paths for a specific input, and no ε-edges in the graph, it should be deterministic. i.e. no branching paths, we can determine for sure where it goes. The one you drew above we can always determine the path for a specific input.
It is surely a Deterministic Finite Automata as it has unique path for every move defined for any of the state.
If we input 1 to this automata, there is only one unique move defined for 1 from initial to the final state. After reaching final state, we don't care if the input is 1 or 2. Had there been multiple moves defined for any state, it would be Non-deterministic Finite Automata.

calculating user defined formulas (with c++)

We would like to have user defined formulas in our c++ program.
e.g. The value v = x + ( y - (z - 2)) / 2. Later in the program the user would define x,y and z -> the program should return the result of the calculation. Somewhen later the formula may get changed, so the next time the program should parse the formula and add the new values. Any ideas / hints how to do something like this ? So far I just came to the solution to write a parser to calculate these formulas - maybe any ideas about that ?
If it will be used frequently and if it will be extended in the future, I would almost recommend adding either Python or Lua into your code. Lua is a very lightweight scripting language which you can hook into and provide new functions, operators etc. If you want to do more robust and complicated things, use Python instead.
You can represent your formula as a tree of operations and sub-expressions. You may want to define types or constants for Operation types and Variables.
You can then easily enough write a method that recurses through the tree, applying the appropriate operations to whatever values you pass in.
Building your own parser for this should be a straight-forward operation:
) convert the equation from infix to postfix notation (a typical compsci assignment) (I'd use a stack)
) wait to get the values you want
) pop the stack of infix items, dropping the value for the variable in where needed
) display results
Using Spirit (for example) to parse (and the 'semantic actions' it provides to construct an expression tree that you can then manipulate, e.g., evaluate) seems like quite a simple solution. You can find a grammar for arithmetic expressions there for example, if needed... (it's quite simple to come up with your own).
Note: Spirit is very simple to learn, and quite adapted for such tasks.
There's generally two ways of doing it, with three possible implementations:
as you've touched on yourself, a library to evaluate formulas
compiling the formula into code
The second option here is usually done either by compiling something that can be loaded in as a kind of plugin, or it can be compiled into a separate program that is then invoked and produces the necessary output.
For C++ I would guess that a library for evaluation would probably exist somewhere so that's where I would start.
If you want to write your own, search for "formal automata" and/or "finite state machine grammar"
In general what you will do is parse the string, pushing characters on a stack as you go. Then start popping the characters off and perform tasks based on what is popped. It's easier to code if you force equations to reverse-polish notation.
To make your life easier, I think getting this kind of input is best done through a GUI where users are restricted in what they can type in.
If you plan on doing it from the command line (that is the impression I get from your post), then you should probably define a strict set of allowable inputs (e.g. only single letter variables, no whitespace, and only certain mathematical symbols: ()+-*/ etc.).
Then, you will need to:
Read in the input char array
Parse it in order to build up a list of variables and actions
Carry out those actions - in BOMDAS order
With ANTLR you can create a parser/compiler that will interpret the user input, then execute the calculations using the Visitor pattern. A good example is here, but it is in C#. You should be able to adapt it quickly to your needs and remain using C++ as your development platform.