Implementing Prolog in C or C++ [closed] - c++

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I was wondering how would Prolog implementation in C or C++ look like. I am mainly interested in building it as a C or C++ library, though the interpreter app would also do. I am interested in reading about its internals, namely query execution i.e. finding the solutions and the associated datatypes involved. I would be glad if you recommended me any readings on topic or for any direct suggestions/advices. Readings might be for other OOP languages or for general OOP as well. Most exhausting material will solve the question.

If you want to see how a Prolog system implemented in C can be used from C/C++ as a library, look at SWI-Prolog. It offers a completely bi-directional interface including non-determinism for Unix/Mac/Window — and much, much more. Think of constraints.
On the other hand, you are asking also about its actual implementation. There are two ways to approach this. You can either start from the very bottom and work yourself up level-to-level. Or you can start with Prolog, and start with meta-interpreters that implement Prolog in Prolog. From this you can slowly dig into the gore.
The traditional approach was to start with the very bottom issues first, studying the various abstract machines. The most commonly cited one is the WAM (Warren Abstract Machine) and then there are
Alternatives to the WAM
you should not miss. Be prepared that it will take a long way from this to a working ISO implementation. There are many issues that are only cursorily dealt with in the literature like garbage collection and constraints. Yet, they are needed for a robust implementation.
The other approach is to first learn Prolog, and then study meta-interpreters in detail. In this manner you might learn to see Prolog from an entirely different perspective. And you might also gain insights you would not get otherwise. You can start with the classical three clause meta-interpreter which reuses much of Prolog's functionality. Depending on your interest you then can start to reify parts of it. The nice thing is that you pay (in terms of code size) almost only for the parts you want to dig in and reuse the other parts of the language.
At least in the past this approach led to various new implementation techniques, e.g. constraints, Erlang, binary Prolog all existed first as a "simple" meta-interpreter. Only then, after understanding the language issues, actual implementations were done.
There is also another point in favour of starting with Prolog first: What happens, if you stop your effort right in the middle of it? With the bottom-up approach you end up with a collection of defunct code. For the second approach, you have learned Prolog.

Some time ago I wrote a Prolog interpreter in C++ (really, my first C++ program), and followed a different approach instead of the (now nearly ubiquitous) WAM. Our teacher at course of languages and compilers construction talked about an ABC algorithm, and I implemented that (I goggled 'Prolog implementation ABC algorithm', here a PDF that I found, but I don't know - yet) : here the core solver
//--------------------------
// evaluation core of query
// use algorithm ABC
//
int IntlogExec::query(const Clause *q)
{
unsigned nc = 0;
ProofStack::Env *pcn, *pfn;
stkpos cn, fn;
#define PCN (pcn = ps->get(cn))
#define PFN (pfn = ps->get(fn))
UnifyStack us(vs, ts);
if (!q)
goto C;
fn = ps->push(STKNULL);
PFN->vspos = vs->reserve(q->get_nvars());
pfn->trail = ts->curr_dim();
pfn->dbpos = 0;
pfn->call = 0;
// current query is empty?
A: if (!q->get_body()) {
// search untried calls
A1: //fn = ps->curr_dim() - 1;
fn = cn;
ProofStack::Env *e = ps->get(fn);
while (e->father != STKNULL) {
if (tr && e->call && tr->exit(fn, e->call))
return -1;
if (e->call && !e->call->is_last())
break;
e = ps->get(fn = e->father);
}
if (e->father == STKNULL)
return 1;
// set call to next untried brother
cn = ps->push(PFN->father);
PCN->call = pfn->call->next();
pcn->vspos = pfn->vspos;
fn = pfn->father;
} else {
cn = ps->push(fn);
PCN->call = q->get_body();
}
A2: PFN;
pcn->dbpos = 0;
cc = pcn->call;
if (nc++ == ncycle)
{
nc = 0;
sighandler();
}
// trace the call
if (tr && tr->call(cn, cc))
return -1;
switch (cc->get_type()) {
case CallData::BUILTIN: {
BuiltIn *btp = cc->get_builtin();
pcn->trail = ts->curr_dim();
pcn->vspos = pfn->vspos;
// if evaluate OK
if (btp->eval(cc->args(), this, 0)) {
// if (tr && tr->exit(cn, cc))
// return -1;
// if (btp->retry || pcn->call->last())
// goto A1;
// pcn->call = pcn->call->next();
// goto A2;
goto A1;
}
PCN;
if (tr && tr->fail(cn, pcn->call))
return -1;
unbind(pcn->trail);
}
goto C1;
case CallData::CUT: {
stkpos gf = PFN->father;
if ( gf != STKNULL &&
pfn->call->is_last() &&
pfn->call == pcn->call->next()) {
// tail recursion optimization
ProofStack::Env *pgf = ps->get(gf);
pgf->vspos = pfn->vspos;
ASSERT(!pcn->call->is_last());
slist_iter s(tmpt);
ElemTmp *t;
while ((t = (ElemTmp*)s.next()) != 0 && t->spos > fn)
t->spos = fn;
CallData *cproc = pcn->call;
cn = ps->pop(cn - fn) - 1;
PCN->call = cproc->next();
fn = pcn->father;
goto A2;
}
pcn->trail = ts->curr_dim();
pcn->vspos = pfn->vspos;
}
goto A1;
case CallData::DISJUNCT: // replace with catenated try
pcn->vspos = pfn->vspos;
pcn->trail = ts->curr_dim();
cn = ps->push(fn);
PCN->call = cc->next(); // left side
goto A2;
case CallData::DBPRED:
// initialize DB search
pcn->dbpos = db->StartProc(cc->get_dbe());
// DB matching & unification
B: if (pcn->dbpos && (q = pcn->dbpos->get()) != 0) {
unsigned nvars = q->get_nvars();
pcn->vspos = vs->reserve(nvars);
pcn->trail = ts->curr_dim();
/*
if (!unify( pfn->vspos, cc->args(),
pcn->vspos, q->h_args(), q->h_arity()))
*/
if (q->h_arity() > 0) {
TermArgs pa1 = cc->args(),
pa2 = q->h_args();
us.clear();
for (int i = q->h_arity() - 1; i > 0; i--) {
UnifyStack::termPair *tp = us.push();
tp->t1 = pa1.getarg(i);
tp->i1 = pfn->vspos;
tp->t2 = pa2.getarg(i);
tp->i2 = pcn->vspos;
}
us.check_overflow();
if (!us.work( pa1.getarg(0), pfn->vspos,
pa2.getarg(0), pcn->vspos))
{
// undo changes
unbind(pcn->trail);
vs->pop(nvars);
// try next match
pcn->dbpos = pcn->dbpos->succ(db);
goto B;
}
}
fn = cn;
goto A;
}
break;
default:
ASSERT(0);
}
if (tr && PCN->call && tr->fail(cn, cc))
return -1;
// backtracking
C1: query_fail(ps->curr_dim() - cn);
// resume top query
C: cn = ps->curr_dim() - 1;
unbind(PCN->trail);
C2: if ((fn = pcn->father) == STKNULL)
return 0;
if ((cc = pcn->call) == 0)
goto C1;
switch (cc->get_type()) {
case CallData::CUT: { // change satisfaction path up to father
stkpos fvp = PFN->vspos;
query_fail(cn - fn + 1);
if ((cn = ps->curr_dim() - 1) != STKNULL) {
unbind(PCN->trail);
vs->pop(vs->curr_dim() - fvp);
goto C2;
}
return 0;
}
case CallData::BUILTIN: { // check builtins retry
BuiltIn *btp = cc->get_builtin();
if (btp->args & BuiltIn::retry) {
if (tr && tr->redo(cn, cc))
return -1;
// could be resatisfied
pcn->trail = ts->curr_dim();
pcn->vspos = PFN->vspos;
// if evaluate OK
if (btp->eval(cc->args(), this, 1))
goto A1;
}
// failed
goto C1;
}
case CallData::DISJUNCT: // evaluate right side
if (tr && tr->redo(cn, cc))
return -1;
pcn->call = cc->get_orelse();
goto A2;
case CallData::DBPRED: // a DB query node to retry
if (tr) { // display REDOs (TBD)
if (pcn->dbpos && pcn->dbpos->succ(db) && tr->redo(cn, cc))
return -1;
}
vs->pop(vs->curr_dim() - pcn->vspos);
pcn->dbpos = pcn->dbpos->succ(db);
PFN;
goto B;
default:
ASSERT(0);
}
return -1;
}
now I'm not very proud of that code: instead of ABC I ended up (by means of rather painful debugging) to an A-A1-A2 B C1-C-C2.
edit: I placed the complete interpreter sources in github.

You can start by checking the answers to this question.
You can also check the source of various open-source prolog implementations (gnu prolog, swi-prolog, yap prolog and more) (although this might be too complicated if you just want a "naive" implementation or some prolog-like features like backtracking).
Finally you should check the prolog ISO.
Having said that, if you are interested in combining C and prolog there are some interfaces you can use; I don't think that implementing an (efficient) prolog is a trivial task, especially if we consider that there are (surprisingly) many companies/organizations dedicated to it.

You might also be interested in looking at Mike Spivey's An Introduction to Logic Programming through Prolog. Both the full text of the book as well as an implementation of a simplified Prolog are available at the previous link (Note: the implementation itself is written in a minimal Pascal dialect, but for compilation this is translated into C. According to the author, this minimal Pascal dialect is more or less the "intersection of Pascal and C", anyway---whatever that means, so while not strictly satisfying the criteria, it should be quite useful for learning about Prolog).
I also noticed Alan Mycroft's Logic Programming and Functional Nets, following this link you will find a Prolog interpreter in C++, but I don't know much about it.

Related

g++ optimization makes the program unable to run

I implemented a path planning algorithm based on D*-Lite. When I do not turn on optimization (-O0), the program can run normally. But when I turn on the optimization level (-O1/2/3), the program cannot be terminated. In Visual Studio, both debug mode and release mode can run normally. In the above cases, the codes are the same.I don’t know how to find the problem, can anyone help me?
class DstarLite {
public:
DstarLite() = delete;
DstarLite(GridStatus* a, GridStatus* b, FILE* fp)
: k_m_(0), start_(a), last_(start_), goal_(b), open_close_(fp) {}
void calculateKey(GridStatus* s);
void updateVertex(GridStatus* u);
void initialize();
void computeShortestPath();
void rePlanning(vector<pair<GridStatus*, int>>& node_change);
GridStatus* getStart();
void setStart(GridStatus* val);
GridStatus* getGoal();
private:
Fib frontier_;
double k_m_;
unordered_map<GridStatus*, handle_t>
heap_map_;
GridStatus* start_;
GridStatus* last_;
GridStatus* goal_;
FILE* open_close_;
};
void DstarLite::calculateKey(GridStatus* s) {
s->f = min(s->g, s->rhs) + heuristic(start_, s) + k_m_;
s->k2 = min(s->g, s->rhs);
}
void DstarLite::initialize() {
fprintf(open_close_, "%d %d\n", start_->x, start_->y);
fprintf(open_close_, "%d %d\n", goal_->x, goal_->y);
goal_->rhs = 0;
calculateKey(goal_);
handle_t hand = frontier_.push(goal_);
heap_map_[goal_] = hand;
}
void DstarLite::updateVertex(GridStatus* u) {
bool heap_in = heap_map_.find(u) != heap_map_.end();
if (u->g != u->rhs && heap_in) {
calculateKey(u);
frontier_.update(heap_map_[u]);
} else if (u->g != u->rhs && !heap_in) {
calculateKey(u);
handle_t hand = frontier_.push(u);
heap_map_[u] = hand;
} else if (u->g == u->rhs && heap_in) {
calculateKey(u);
frontier_.erase(heap_map_[u]);
heap_map_.erase(u);
}
}
void DstarLite::computeShortestPath() {
int count = 0;
while (smaller(frontier_.top(), start_) || !myEqual(start_->rhs, start_->g)) {
count++;
auto u = frontier_.top();
pair<double, double> k_old = {u->f, u->k2};
pair<double, double> k_new;
k_new.first = min(u->g, u->rhs) + heuristic(start_, u) + k_m_;
k_new.second = min(u->g, u->rhs);
if (k_old < k_new) {
calculateKey(u);
frontier_.update(heap_map_[u]);
} else if (myGreater(u->g, u->rhs)) {
u->g = u->rhs;
frontier_.pop();
heap_map_.erase(u);
for (auto s : neighbors(u)) {
if (s->rhs > u->g + cost(u, s)) {
s->next = u;
s->rhs = u->g + cost(u, s);
updateVertex(s);
}
}
} else {
double g_old = u->g;
u->g = kDoubleInfinity;
auto neighbor = neighbors(u);
neighbor.push_back(u);
for (auto s : neighbor) {
if (myEqual(s->rhs, cost(s, u) + g_old)) {
if (!equal(s, goal_)) {
double pp_s = kDoubleInfinity;
for (auto succ : neighbors(s)) {
double dis = succ->g + cost(succ, s);
if (dis < pp_s) {
pp_s = dis;
s->next = succ;
}
}
s->rhs = pp_s;
}
}
updateVertex(s);
}
}
}
cout << "Dstar visited nodes : " << count << endl;
}
void DstarLite::rePlanning(vector<pair<GridStatus*, int>>& node_change) {
k_m_ += heuristic(last_, start_);
last_ = start_;
for (auto change : node_change) {
GridStatus* u = change.first;
int old_threat = u->threat;
int new_threat = change.second;
double c_old;
double c_new;
u->threat = new_threat;
u->rhs += (new_threat - old_threat) * threat_factor;
updateVertex(u);
for (auto v : neighbors(u)) {
u->threat = old_threat;
c_old = cost(v, u);
u->threat = new_threat;
c_new = cost(v, u);
if (c_old > c_new) {
if (v != goal_) {
if (v->rhs > u->g + c_new) {
v->next = u;
v->rhs = u->g + c_new;
}
}
} else if (myEqual(v->rhs, c_old + u->g)) {
if (v != goal_) {
double pp_s = kDoubleInfinity;
for (auto pre : neighbors(v)) {
double dis = pre->g + cost(pre, v);
if (dis < pp_s) {
pp_s = dis;
v->next = pre;
}
}
v->rhs = pp_s;
}
}
updateVertex(v);
}
}
}
GridStatus* DstarLite::getStart() { return start_; }
void DstarLite::setStart(GridStatus* val) { start_ = val; }
GridStatus* DstarLite::getGoal() { return goal_; }
DstarLite dstar(start, goal, open_close);
dstar.initialize();
dstar.computeShortestPath();
Sorry, I think it is difficult to locate the problem in the code, so the code was not shown before. Now I have re-edited the question, but there are a lot of codes, and the main calling part is computeShortest().
As you did not provide any code, we can give you only some general hints to fix such problems.
As a first assumption your code has definitely one or more bugs which causes what we call undefined behaviour UB. As the result is undefined, it can be anything and is often changing behaviour with different optimization levels, compiler versions or platforms.
What you can do:
enable really ALL warnings and fix them all! Look especially for something like "comparison is always...", "use of xxx (sometimes) without initialization", " invalid pointer cast", ...
try to compile on different compilers. You should also try to use gcc and/or clang, even on windows. It is maybe hard in the first time to get the environment for these compilers run on windows plattforms, but it is really worth to do it. Different compilers will give different warnings. Fixing all warnings from all compilers is a really good help!
you should use memory tracers like valgrind. I have not much experience on windows, but I believe there are also such tools, maybe already integrated in your development suite. These tools are really good in finding "of by x" access, access freed memory and such problems.
if you still run into such trouble, static code analyser tools may help. Typically not as much as managers believe, because today's compilers are much better by detecting flaws as expected by dinosaur programmers. The additional findings are often false positives, especially if you use modern C++. Typically you can save the money and take a class for your own education!
Review, Review, Review with other people!
snip the problem small! You should spend most of your development time by setting up good automated unit tests. Check every path, every function in every file. It is good to see at minimum 95% of all branches covered by tests. Typically these tests will also fail if you have UB in your code if you change optimizer levels and or compiler and platforms.
using a debugger can be frustrating. In high optimized code you jump through all and nothing and you may not really see where you are and what is the relation to your code. And if in lower optimizer level the bug is not present, you have not really much chance to see find the underlying problem.
last but not least: "printf debugging". But this may change the behaviour also. In worst case the code will run always if you add a debug output. But it is a chance!
use thread and memory sanitizers from your compiler.
The problem is caused by the comparison of floating-point numbers. I deliberately put aside this question when I wrote the code before :). Now it can operate normally after being fixed.

How to switch between diffrent for loops, and would such a mechanism make any sense?

This is more a, I'm kind of curious to know if it would make sense question, than a, I have a real problem question, I'm interessted in your opinion. If there are any syntax errors, I use pseudo-code to illustrate what intent to describe.
I have a program that uses a for-loop.
for (frame_pos = 0; frame_pos < frame_size; frame_pos++) {
ABC...
}
Now I want to add another possible way to iterate through my program.
for (frame_pos = framelist.first; framlist.hasNext; frame_pos = framelist.getNext) {
ABC...
}
So I wrote an if statement
if(a == true){
for (frame_pos = 1; frame_pos <= frame_size; frame_pos++) {
ABC...
}
}else{
for (frame_pos = framelist.first; framlist.hasNext; frame_pos = framelist.getNext) {
ABC...
}
}
But somehow I didn't like it beacause I had duplicated my code.
ABC...
Of course I could move everything from within my loops to a method and only invoke that method. But I was wondering, if something like
switch(a){
case(true):
for (frame_pos = 1; frame_pos <= frame_size; frame_pos++) {
break;
default:
for (frame_pos = framelist.first; framlist.hasNext; frame_pos = framelist.getNext) {
break;
}
would be possible and, if possible, usefull and make sense, because I would have used it here. It, of course, doesn't necessarily has to be a switch-case it could be some other mechanism. But my intention was/is to split the, from my point of view, atomic
for( ; ; ) {
...
}
body and recombine it.
Make ABC a function (extract method) and call that.
If you do not want to switch between both method during run-time, you could solve this on pre-processor level:
#define _USENEXT /* Comment out this line to use the "counter" approach. */
...
for (
#ifdef _USENEXT
frame_pos = framelist.first; framlist.hasNext; frame_pos = framelist.getNext
#else
frame_pos = 1; frame_pos <= frame_size; frame_pos++
#endif
)
{
<some code>
}
As an alternative to #defineing _USENEXT in the code as by my example, one could specify it as option when compiling. For gcc this would be -D _USENEXT.
Some languages have mechanisms to easily make such patterns reusable. C# for example, would let you write something like the following:
IEnumerable<Frame> Frames1() {
for (frame_pos = 0; frame_pos < frame_size; frame_pos++) {
yield return framelist[framepos];
}
}
IEnumerable<Frame> Frames2() {
for (frame_pos = framelist.first; framlist.hasNext; frame_pos = framelist.getNext) {
yield return framelist[framepos];
}
}
And then you can use treat these iteration patterns as first class objects like any other.
foreach(var frame in a? Frames1() : Frames2()) {
ABC...
}
With such a feature you can avoid implementation details like those silly boilerplatey low-level error-prone primitive for loops from C.
C++ doesn't have such a syntactic feature, but it also has a similar established mechanism for reusing iteration patterns: iterators. Writing an iterator isn't as dead simple as in C#, though :(
But standard containers already provide suitable iterators. You can then reuse any of the many existing iteration patterns provided in the standard library.
std::vector<int> v = ...;
std::set<int> s = ...;
auto are_equal = std::equal(v.begin(), v.end(), s.begin(), s.end());
Good C++ libraries will similarly provide suitable iterators too. (Yeah, good luck with that; it seems a large portion of people writing "C++ libraries" doesn't know C++)
If you really don't want to make ABC a function an then switch between two different for loops you could instead write three functions:
int initFramepos( int a )
{
return( a ? 1 : framelist.first );
}
int checkFramepos( int frame_pos, int a )
{
return( a ? frame_pos < frame_size ? framelist.hasNext );
}
int incrFramepos( int frame_pos, int a )
{
return( a ? frame_pos+1 ? framelist.getNext );
}
Then your for-loop could look like this:
for( frame_pos = initFramepos( a ); checkFramepos( frame_pos, a ); frame_pos = incrFramepos( frame_pos, a ) )
{
ABC
}
The clean solution (for C++ code - this will not work for C) would be to write an iterator class implementation for your specific case. Then, you could write your client code in terms of an iterator, and decide what iteration means independent of how it is implemented (you will be able to decide what iteration means for you at any point, without changing client code at all).
If you do this and specialize std::begin and std::end, you will be able to use the entire iterators algorithms library in std as a bonus: (sort, copy, find/find_if, for_each, all_of, any_if, transform and accumulate are the most useful, out of the top of my head).
Regarding other solutions, do not use a macro: it results in brittle code with many difficult to see caveats. As a rule of thumb, using a macro in C++ should be (close to) the last considered solution for anything.
What you describing here is exactly the problem that strategy pattern was meant to solve.
Basically, what you need to do here is to make each loop as a method within a class, and then set one of them as your strategy. And of course you can switch between strategies whenever you want.
it will look like this:
class Strategy {
virtual void func () = 0;
};
.
class StrategyA : public Strategy {
virtual void func () {
for (frame_pos = 0; frame_pos < frame_size; frame_pos++) {
ABC...
}
}
};
.
class StrategyB : public Strategy {
virtual void func () {
for (frame_pos = framelist.first; framlist.hasNext; frame_pos = framelist.getNext) {
//ABC...
}
}
};
.
class StrategyToTake {
private:
Strategy* strategy;
public:
void execute () {strategy->func();}
void setStrategy (Strategy* newStrategy) {this.strategy = newStrategy;}
};
.

Generate all matches from a subset of regex

I need to define a bunch of vector sequences, which are all a series of L,D,R,U for left, down, right, up or x for break. There are optional parts, and either/or parts. I have been using my own invented system for noting it down, but I want to document this for other, potentially non-programmers to read.
I now want to use a subset (I don't plan on using any wildcards, or infinite repetition for example) of regex to define the vector sequence and a script to produce all possible matching strings...
/LDR/ produces ['LDR']
/LDU?R/ produces ['LDR','LDUR']
/R(LD|DR)U/ produces ['RLDU','RDRU']
/DxR[DL]U?RDRU?/ produces ['DxRDRDR','DxRDRDRU','DxRDURDR','DxRDURDRU','DxRLRDR','DxRLRDRU','DxRLURDR','DxRLURDRU']
Is there an existing library I can use to generate all matches?
EDIT
I realised I will only be needing or statements, as optional things can be specified by thing or nothing maybe a, or b, both optional could be (a|b|). Is there another language I could use to define what I am trying to do?
By translating the java code form the link provided by #Dukeling into javascript, I think I have solved my problem...
var Node = function(str){
this.bracket = false;
this.children = [];
this.s = str;
this.next = null;
this.addChild = function(child){
this.children.push(child);
}
}
var printTree = function(root,prefix){
prefix = prefix.replace(/\./g, "");
for(i in root.children){
var child = root.children[i]
printTree(child, prefix + root.s);
}
if(root.children.length < 1){
console.log(prefix + root.s);
}
}
var Stack = function(){
this.arr = []
this.push = function(item){
this.arr.push(item)
}
this.pop = function(){
return this.arr.pop()
}
this.peek = function(){
return this.arr[this.arr.length-1]
}
}
var createTree = function(s){
// this line was causing errors for `a(((b|c)d)e)f` because the `(((` was only
// replacing the forst two brackets.
// var s = s.replace(/(\(|\||\))(\(|\||\))/g, "$1.$2");
// this line fixes it
var s = s.replace(/[(|)]+/g, function(x){ return x.split('').join('.') });
var str = s.split('');
var stack = new Stack();
var root = new Node("");
stack.push(root); // start node
var justFinishedBrackets = false;
for(i in str){
var c = str[i]
if(c == '('){
stack.peek().next = new Node("Y"); // node after brackets
stack.peek().bracket = true; // node before brackets
} else if (c == '|' || c == ')'){
var last = stack.peek(); // for (ab|cd)e, remember b / d so we can add child e to it
while (!stack.peek().bracket){ // while not node before brackets
stack.pop();
}
last.addChild(stack.peek().next); // for (b|c)d, add d as child to b / c
} else {
if (justFinishedBrackets){
var next = stack.pop().next;
next.s = "" + c;
stack.push(next);
} else {
var n = new Node(""+c);
stack.peek().addChild(n);
stack.push(n);
}
}
justFinishedBrackets = (c == ')');
}
return root;
}
// Test it out
var str = "a(c|mo(r|l))e";
var root = createTree(str);
printTree(root, "");
// Prints: ace / amore / amole
I only changed one line, to allow more than two consecutive brackets to be handled, and left the original translation in the comments
I also added a function to return an array of results, instead of printing them...
var getTree = function(root,prefix){
this.out = this.out || []
prefix = prefix.replace(/\./g, "");
for(i in root.children){
var child = root.children[i]
getTree(child, prefix + root.s, out);
}
if(root.children.length < 1){
this.out.push(prefix + root.s);
}
if(!prefix && !root.s){
var out = this.out;
this.out = null
return out;
}
}
// Test it
var str = "a(b|c)d";
var root = createTree(str);
console.log(getTree(root, ""));
// logs ["abd","acd"]
The last part, to allow for empty strings too, so... (ab|c|) means ab or c or nothing, and a convenience shortcut so that ab?c is translated into a(b|)c.
var getMatches = function(str){
str = str.replace(/(.)\?/g,"($1|)")
// replace all instances of `(???|)` with `(???|µ)`
// the µ will be stripped out later
str = str.replace(/\|\)/g,"|µ)")
// fix issues where last character is `)` by inserting token `µ`
// which will be stripped out later
str = str+"µ"
var root = createTree(str);
var res = getTree(root, "");
// strip out token µ
for(i in res){
res[i] = res[i].replace(/µ/g,"")
}
// return the array of results
return res
}
getMatches("a(bc|de?)?f");
// Returns: ["abcf","adef","adf","af"]
The last part is a little hack-ish as it relies on µ not being in the string (not an issue for me) and solves one bug, where a ) at the end on the input string was causing incorrect output, by inserting a µ at the end of each string, and then stripping it from the results. I would be happy for someone to suggest a better way to handle these issues, so it can work as a more general solution.
This code as it stands does everything I need. Thanks for all your help!
I'd imagine what you're trying is quite easy with a tree (as long as it's only or-statements).
Parse a(b|c)d (or any or-statement) into a tree as follows: a has children b and c, b and c have a mutual child d. b and c can both consist of 0 or more nodes (as in c could be g(e|f)h in which case (part of) the tree would be a -> g -> e/f (2 nodes) -> h -> d or c could be empty, in which case (part of) the tree would be a -> d, but an actual physical empty node may simplify things, which you should see when trying to write the code).
Generation of the tree shouldn't be too difficult with either recursion or a stack.
Once you have a tree, it's trivial to recursively iterate through the whole thing and generate all strings.
Also, here is a link to a similar question, providing a library or two.
EDIT:
"shouldn't be too difficult" - okay, maybe not
Here is a somewhat complicated example (Java) that may require some advanced knowledge about stacks.
Here is a slightly simpler version (Java) thanks to inserting a special character between each ((, )), |(, etc.
Note that neither of these are particularly efficient, the point is just to get the idea across.
Here is a JavaScript example that addresses parsing the (a|b) and (a|b|) possibilities, creates an array of possible substrings, and composes the matches based on this answer.
var regex = /\([RLUD]*\|[RLUD]*\|?\)/,
str = "R(LD|DR)U(R|L|)",
substrings = [], matches = [], str_tmp = str, find
while (find = regex.exec(str_tmp)){
var index = find.index
finds = find[0].split(/\|/)
substrings.push(str_tmp.substr(0, index))
if (find[0].match(/\|/g).length == 1)
substrings.push([finds[0].substr(1), finds[1].replace(/.$/, '')])
else if (find[0].match(/\|/g).length == 2){
substrings.push([finds[0].substr(1), ""])
substrings.push([finds[1], ""])
}
str_tmp = str_tmp.substr(index + find[0].length)
}
if (str_tmp) substrings.push([str_tmp])
console.log(substrings) //>>["R", ["LD", "DR"], "U", ["R", ""], ["L", ""]]
//compose matches
function printBin(tree, soFar, iterations) {
if (iterations == tree.length) matches.push(soFar)
else if (tree[iterations].length == 2){
printBin(tree, soFar + tree[iterations][0], iterations + 1)
printBin(tree, soFar + tree[iterations][1], iterations + 1)
}
else printBin(tree, soFar + tree[iterations], iterations + 1)
}
printBin(substrings, "", 0)
console.log(matches) //>>["RLDURL", "RLDUR", "RLDUL", "RLDU", "RDRURL", "RDRUR", "RDRUL", "RDRU"]

How are error-handling statements formatted?

I have a few functions that return a 1 if an error is encountered. Each function calls on a lower-level function, such that if the lower-level function returns a 1, the original function returns a 1 as well. Thus errors get passed up the chain in this way.
Here's an highly abridged version of one of these functions:
if (low_level_function()) {
[do stuff]
return 1;
}
[do other stuff]
return 0;
Should I instead declare an error variable, assign the result of low_level_function() to it, and then use the error variable in the if() statement? In other words:
int error = low_level_function();
if (error) {
[do stuff]
return 1;
}
[do other stuff]
return 0;
Or is there yet another, better way of doing this? I've never coded to account for errors before, so my experience here is rather limited.
Edit: I've reformatted the functions to better convey the nature of my code.
One reason to prefer the second form is when you don't have anything to do in the error case and you want to avoid the stair-step effect of nested if statements.
int error_flag = low_level_function();
if (!error_flag)
error_flag = second_function();
if (!error_flag)
error_flag = third_function();
return error_flag;
Of course for that specific example you can really simplify by using the short-circuiting property of ||:
return low_level_function() || second_function() || third_function();
I dont see the difference between the two approaches above.
I would recomment using exception, much more cleaner approach. why the reinvent the wheel? You can either use standard exception or implement custome exception like
You can use this also,
return low_level_function();
If low_level_function() returns nonzero on error and zero on success. Or
return low_level_function()>0? 1 : 0;
Although it it's s side comment I´ll be first stateing that I prefer one exit for any method.
One major pro tof his construction is the possiblity to only have the need for a error-logging statement at one place.
Also it's very easy to add tracing logs for debugging porpose.
So following this idea I'd propose the following
#define OK (0)
int mid_level_func(....)
{
log_entry(...);
int rc = OK
{
...
if ((rc = low_level_func1(...)))
goto lblExit;
...
if ((rc = low_level_func2(...)))
goto lblExit;
...
lblExit:
;
}
if (OK != rc)
log_error(rc, ...);
log_exit(...);
return rc;
}
For the ones that insist on goto being 'evil' the following variation on the scheme above might help:
#define OK (0)
int mid_level_func(....)
{
log_entry(...);
int rc = OK
do
{
...
if ((rc = low_level_func1(...)))
break;
...
if ((rc = low_level_func2(...)))
break;
...
} while (0);
if (OK != rc)
log_error(rc, ...);
log_exit(...);
return rc;
}

Throw when succeeding vs. do-while-false vs. function array vs. goto statements

Consider a class that is supposed to make parameter suggestion, given some clues, and a specific acceptance test.
Example to concretise:
Say you are guessing the cubic dimensions of a raw data file, based on the filename. The acceptance test is: total elements == file-size (assuming 1 byte pr. grid unit).
This requires a prioritized ordering of tests, where each test does one or more attempt to pass the acceptance test. The first suggestion that passes is immediately returned, and no more attempts are made. If none pass, suggest nothing.
The question: Which pattern/approach would you recommend, when readability is the main concern? Also, what are the flaws and drawbacks with the following suggestions?
Method 1: Exceptions for catching a successful acceptance test
I've heard said by wise people to avoid using try/catch when not catching actual exceptions. However, in this case, the result is fairly readable, and looks something like:
try {
someTest1();
someTest2();
// ...
someTestN();
}
catch(int){
// Succesfull return
xOut = x_; yOut = y_; zOut = z_;
return;
}
xOut = -1; yOut = -1; zOut = -1;
With the inner acceptance test:
void acceptanceTest(const int x, const int y, const int z)
{
if (verify(x * y * z)) {
x_ = x; y_ = y; z_ = z;
throw 1;
}
}
Method 2: Do-while-false:
Change: All tests return true as soon it passes the acceptance test. Returns false if all tries in the test fails.
do {
if ( someTest1() ) break;
if ( someTest2() ) break;
// ...
if ( someTestN() ) break;
// All tests failed
xOut = -1; yOut = -1; zOut = -1;
return;
} while (0);
xOut = x_; yOut = y_; zOut = z_;
Acceptance test:
bool acceptanceTest(const int x, const int y, const int z)
{
if (verify(x * y * z)) {
x_ = x; y_ = y; z_ = z;
return true;
}
return false;
}
Method 3: Array of function pointers
typedef bool (TheClassName::*Function)();
Function funcs[] = { &TheClassName::someTest1,
&TheClassName::someTest2,
// ...
&TheClassName::someTestN };
for (unsigned int i = 0; i < sizeof(funcs)/sizeof(funcs[0]); ++i) {
if ( (this->*funcs[i])() ) {
xOut = x_; yOut = y_; zOut = z_;
return;
}
}
xOut = -1; yOut = -1; zOut = -1;
Test functions and acceptance test the same as for do-while-false.
Method 4: Goto
I've seen the do-while-false referred to as a disguised goto, followed by the argument that if that's the intended behavior "why not use goto?". So I'll list it up:
if (someTest1() ) goto success;
if (someTest2() ) goto success;
// ...
if (someTestN() ) goto success;
xOut = -1; yOut = -1; zOut = -1;
return;
success:
xOut = x_; yOut = y_; zOut = z_;
return;
Test functions and acceptance test the same as for do-while-false.
Method 5: Short-circuiting logic (suggested by Mike Seymour)
if (someTest1() ||
someTest2() ||
// ...
someTestN()) {
// success
xOut = x_; yOut = y_; zOut = z_;
return;
}
xOut = -1; yOut = -1; zOut = -1;
Test functions and acceptance test the same as for do-while-false.
Edit: I should point out that Methods 2,3,4,5 differ from 1 by requiring boolean return value on the acceptance test that is passed all the way back to the return function, as well as added overhead in each test function that does multiple attempts at passing the acceptance test.
This makes me think that method 1 has an advantage to maintainability as the control logic is solely at the bottom level: the acceptance test.
Method 5: Short-circuiting logic
if (someTest1() ||
someTest2() ||
// ...
someTestN())
{
// success
}
This is equivalent to (and in my opinion easier to follow than) options 2 and 4, which emulate the short-circuiting behaviour with other flow control operations.
Option 3 is very similar, but a bit more flexible; it might be a good idea if you need to apply the same pattern to different sets of tests, but is overkill if you just have a single set of tests.
Option 1 will be rather surprising to a lot of people, since exceptions are generally only used for unexpected events; although, if the tests are structured so that detecting success happens somewhere down a deep call chain, then this might be more convenient than passing a return value back up. It will certainly need documenting, and you should throw a type with a meaningful name (e.g. success), and be careful that it won't be caught by any error-handling mechanism. Exceptions are usually much slower than normal function returns, so bear that in mind if performance is an issue. Having said all that, if I were tempted to use exceptions here, I would certainly be looking for ways to simplify the structure of the tests to make a return value more convenient.
Well, I really would go for the simplest and easy solution:
bool succeeded;
if (!succeeded)
succeeded = someTest1();
if (!succeeded)
succeeded = someTest2();
if (!succeeded)
succeeded = someTest3();
if (!succeeded)
succeeded = someTestN();
I can argue the other solutions but sumarizing: simple good, complicated bad.
I think that first method is more readable, c++ based and easiest to maintain than the others. Gotos and do-while-false adds a little bit of confusion. There are variants about all of these methods, but I prefer the first one.