Writing C++ "Scripts"

Writing C++ "Scripts" - c++

I am a solo developer on a large C++ library that I use for research (I'm a PhD student). Let's say the library has a bunch of classes that implement cool algorithms: Algorithm1, Algorithm2, etc. I then write a bunch of C-style functions that are stand-alone "scripts" that use the library to either test the recently added functionality or to run simulations that produce plots that I then include in wonderfully-brilliant (I'm in denial) journal publications. The design of the library follows good software engineering principles (to the best of my knowledge and ability), but the "scripts" that link the library from main.cpp do not follow any principle except: "get the job done".
I now have over 300 such "scripts" in a single file (20,000+ lines of code). I have no problem with it, I remain very productive, and that's really the ultimate goal. But I wonder if this approach has major weaknesses that I just have learned to live with.
// File: main.cpp
#include <cool_library/algorithm1.h>
#include <cool_library/algorithm2.h>
...
#include <cool_library/algorithmn.h>
void script1() {
// do stuff that uses some of the cool library's algorithms and data structures
// but none of the other scriptX() functions
}
void script2() {
// do stuff that uses some of the included algorithms and data structures
}
...
// Main function where I comment in the *one* script I want to run.
int main() {
// script1();
// script2();
// script3();
...
script271();
return 0;
}
Edit 1: There are several goals that I have in this process:
Minimize the time it takes to start a new script function.
Make all old script functions available at my finger tips for search. So I can then copy and paste bits of those scripts into a new one. Remember this is NOT supposed to be good design for use by others.
I don't care about the compilation time of the script file because it compiles in under a second as it is now with the 20,000 lines of code.
I use Emacs as my "IDE" by the way, in Linux, using the Autoconf/Automake/Libtool process for building the library and the scripts.
Edit 2: Based on the suggestions, I'm starting to wonder if part of the way to increase productivity in this scenario is not to restructure the code, but to customize/extend the functionality of the IDE (Emacs in my case).

If I were you, I would split that huge file into 300 smaller ones: each would have just one scriptNN() and main() calling just it.
Now, when you have it compiled, you will have 300 small scriptNN executables (you may need to create appropriate Makefile for this though).
What's nice about this - now you can use these script executables as building blocks to be put or called by other scripts, like bash, python, perl, etc.
EDIT Explanation how this design allows to address your goals.
Time to start new script function - simply copy one of existing files and tweak it a little.
Make all old script functions available at my finger tips for search - emacs can do multi-file search across all other script files you have.
I don't care about the compilation time of the script file - it does not matter then. But you will have all of them available to you at once, without editing one big main() and recompiling.

Your example may be a good use case of scripting language. To be more specific, you could all your script* C++ functions glued to some interpreter, like Lua, Python, Ocaml, Guile etc... and have your test cases be written in the scripting language.
All scripting languages enable you to glue your C (hence also C++) functions.
For Lua, see its Lua API chapter. For Python, see its Extending & Embedding Python section. For Ocaml, see Interfacing C with OCaml section. For Guile, see Programming in C chapter.
You may wish to embed the interpreter inside your main function, or you could extend the existing interpreter with your new C++ functions (hence using some main provided by the interpreter).
Notice that using some scripting language may have a profound impact on the design and architecture of your library and software

If you are comfortable with it, and it works for you, just stick with it. You said you are the only developer, then just do whatever you want. I always spend too much time thinking about things like this for my projects :P. I've learned to just focus on the important and productive things. Theoretical things only work in theory...

All the suggested answers are good and you can even combine them. Just to add my 5 cents: your execution flow fits exactly into Strategy and Command design patterns. You may want to look at their benefits, but it's a question of benefit vs. investment.

Related

Run C or C++ code from Node.js in an efficient way

Suppose I have a simple Hello, World! file in C++ or C (whatever will help me use it easier in Node.js, preferably C) and want to run it from a Node.js file. What is the most efficient way considering that the file will be used to boost the performance (changing CPU intensive functions from Node.js to C/C++)?
I came across the addons, but it seems to me, that in order to use it, I'll have to convert a lot of code to bring it to that format. Is there an easier way?

I don't see why using child_process would be slower than other options.
I recommend:
// myCFile.c
#include <stdio.h>
int main(void){
// Processor-intensive computations
int x = 1+2+3;
// Send to node via standard output
printf("%d", x);
// Terminate the process
return 0;
}
Compile:
gcc -o myExecutable myCFile.c
And use child_process like this:
// File myNodeFile.js
const { exec } = require("child_process");
exec("./myExecutable", (error, stdout, stderr) => console.log(stdout));

For our image segmentation algorithm that I had written in C++, I needed to help the full-stack developer wrap the shared library for Node.js. As far as I can see, from a day of googling around and hacking into Node.js, which is a somewhat unfamiliar world for me, that there are two major options:
using node-ffi, or,
addons as you have already stated.
For 1. above, you do not need to do much. You simply need to require the ffi, ref and ref-array packages/addons in node.js to be able to call the C API of your application code. There is some nice tutorial that I followed, which helped me get going in 15 minutes.
However, I needed to choose 2. above for our project in the end. This was due to the fact that our full-stack developer was relying on some other addons that needed the latest version of Node.js. Apparently, when we check the issue board of node-ffi, as of this answer's posting time, it does not support the v9.x family of node.js. Hence, I went the native addons way. It has taken me roughly 4 hours to understand and write the code. I am not sure if it is the most convenient/efficient way possible, but what I did was to
use buffers to allocate memory in Node.js,
write a simple addon using nan in Node.js that reinterpret_casts the char* buffer of Node.js and calls the very same C API of our shared library, and finally,
link against the shared library we had created using binding.gyp.
Apparently, Native Abstractions for Node.js (aka nan) is supposed to be used by users to avoid the need to handle breaking changes introduced in v8. There is another nice tutorial I have found, which helped me solve my problem easily.
Finally, Scott Frees' blog site seems to have a lot of self-contained articles/examples for those who would like to go deeper. He also argues in which situations you should be preferring one approach over the other (node-ffi over native addons, for instance). Basically, what I understand is that writing native addons will be more efficient, even though for our application it did not matter much. node-ffi gives satisfactory behaviour, too, as we were solving an image segmentation problem (which anyways takes more time than the call overhead).
So, in short,
I came across the addons but it seems to me, that in order to use it I'll have to convert a lot of code to bring it to that format.
Well, not necessarily! It depends on what you are willing to achieve. It can be as easy as compiling your C++ code for a specific C-API shared library, and then writing a 20-liner wrapper in nan, which basically does some reinterpret_cast for in-place memory operations, and finally linking against the library in binding.gyp.
Is there an easier way?
Yes, there is. node-ffi can help you solve the problem under half an hour. But then, it might not be the most efficient for your scenario, or it might not be a viable option for you, as it currently does not build with the v9.x family of Node.js.

There is an option to compile C/C++ with emscripten to WebAssembly and for quick execution on Node. Calling WebAssembly code from JavaScript is not trivial, but allows more flexibility for input and output parameters than communicating with child process.

What does embedding a language into another do?

This may be kind of basic but... here goes.
If I decide to embed some kind of scripting language like Lua or Ruby into a C++ program by linking it's interpreter what does that allow me to do in C++ then?
Would I be able to write Ruby or Lua code right into the cpp file or simply call scripts from the program?
If the latter is true, how would I do that?

Because they're scripting languages, the code is always going to be "interpreted." In reality, you aren't "calling" the script code inside your program, but rather when you reach that point, you're executing the interpreter in the context of that thread (the thread that reaches the scripting portion), which then reads the scripting language and executes the applicable machine code after interpreting it (JIT compiling kind of, but not really, there's no compiling involved).
Because of this, its basically the same thing as forking the interpreter and running the script, unless you want access to variables in your compiled program/in your script from the compiled program. To access values to/from, because you're using the thread that has your compiled program's context, you should be able to store script variables on the stack as well and access them when your thread stops running the interpreter (assuming you stored the variables on the stack).
Edit: response:
You would have to write it yourself. Think about it this way: if you want to use assembly in c++, you use the asm keyword. You then in the c++ compiler, need to parse the source file, get to the asm keyword, and then switch to the assembly compiler. Then the assembly compiler needs to go until the end bracket of the asm region and compile this code.
If you want to do this,it will be a bit different, since assembly gets compiled, not interpreted (which is what you want to do). What you'll need to do, is change the compiler you're using (lets say c++), so that it recognizes your own user defined keyword. Lets say this keyword is scriptX{}. You need to change the c++'s parser so that when it see's scriptX{}, it stores everything between the brackets in the readonly data section of your compiled program. You then need to add a hook in the compiled assembly file to switch the context of the thread to your script interpreter, and start the program counter at the beginning of your script section (which you put in read only data section of the object file).
Good luck with that...

A common reason to embed a scripting language into a program is to provide for the ability to control the program with scripts provided by the end user.
Probably the simplest example of such a script is a configuration file. Assume that your program has options, and needs to remember the options from run to run. You could write them out to a file as a binary image of your options structure, but that would be fragile, not easy to inspect or edit, and likely not portable across systems. Writing the options out in plain text with some sort of labels for which is which addresses most of those complaints, but now you need to parse that text and recover the options. Then some users want different options on Tuesdays, want to do simple arithmetic to compute one option from another, or to write one configuration file that they can use on both Windows and Linux, and pretty soon you find yourself inventing a little language to express all of those ideas and mechanisms with. At this point, there's a better way.
The languages Lua and TCL both grew out of essentially that scenario. Larger systems needed to be configured and controlled by end users. End users wanted to edit a simple text file and get immediate satisfaction, even (especially) when working with large systems that might have required hours to compile successfully.
One advantage here is that rather than inventing a programming language one feature at a time as user's needs change, you start with a complete language along with its documentation. The language designer has already made a number of tough decisions for you (how do I represent strings and numbers, what about lists, what about named values, what does if look like, etc.) and has generally also brought a carefully designed and debugged implementation to the table.
Lua is particularly easy to integrate. Reading a simple configuration file and extracting the settings from the Lua state can be done using a small subset of its C API. Once you have Lua available, it is attractive to use it for other purposes. In many cases, you will find that it is more productive to write only the innermost loops in C, and use Lua to glue those functions together and provide all the "business logic" of the application. This is how Adobe Lightroom is implemented, as well as many games on platforms ranging from simple set-top-boxes to iOS devices and even PCs.

What generic template processor should I use?

This is a potentially dangerous question because interdisciplinary questions and answers will be biased, but I'll have a stab at it anyway. All in good spirit!
So, here we go. I'm writing a major editing mode for Emacs for the language that it has almost no support for yet. And I'm at the point, where I have to decide on a way to generate project files. Below is the syllabus of the task ahead:
The templates have to represent project directory tree, not only single files.
The resulting files are of various formats, potentially including SGML-like languages, but not limited to this variety. They also have to generate C-like source code and, eLisp source code and plain text files, like README, for example.
The templates must be processed in a batch upon user-initiated action (as in user wants to create a project - several files must be created in the user-appointed directory). It may be beneficial to have an ability to supervise the creation, but this is less important then the ability to run the process entirely automatically.
Bonus features:
The template language has already a user base (with a potential of reuse of existing templates).
The templates can be used for code snippets (contain blanks which are filled interactively once the user invokes code-generating routine while editing the file).
Obvious things like cross-platform-ness, ease of use both through graphical interface and command line.
I made a research, but I won't share my results (yet) so I won't bias the answers. The problem with answering this question is not that the answer is hard to find, but that it is hard to chose one from many.

I'm developing a system based on Mustache for exactly the use case that you've described. The template language itself is a very simple extension of Mustache called Groome.
I also released a command-line tool called Molt that renders Groome templates. I'd be curious to know if it does everything that you need. I'm still adding features to the tool and haven't yet announced it. Thanks.

I went to solve a similar problem several years aback, where I wanted to use Emacs to generate code out of a UML diagram (cogre), and also generate Makefiles from project specifications. I first tried to use Tempo, but when I tried to get the templates to nest, I ran into problems. I also looked into skeleton, but that didn't quite fit the plan either.
I ended up using Google Templates for a little bit, and liked the syntax, and developed SRecode instead, and just borrowed the good bits from Google templates. SRecode was written specifically for machine-generated code. The interaction for template insertion (aka - what tempo was written for) isn't first class in SRecode. For generating code from a data structure, however, it is very robust, and has a lot of features, and automatically filled variables. It works closely with your major mode, and allows many nested templates, with control over the nested dictionary values. There is a subsystem that will use Semantic tags and generate code from them for a couple languages. That means you can parse code in one language with Semantic, and generate code in another language with SReocde using those tags. Nifty! Many parts of CEDET Reference manuals were built that way.
The templates themselves allow looping, if statements, and include statements. There are a couple examples in SRecode for making an 'application', such as the comment writer, and EDE uses it to create Makefiles, which is almost exactly what you are trying to do.

Another option is Generator, which offers “language-agnostic project bootstrapping with an emphasis on simplicity”. Installation requires Node.js and npm.
Generator’s emphasis on simplicity means it is very easy to learn how to make a template. Generator also saves you from having to reference templates by file paths – it looks for templates in ~/.generator.
However, there is no way to write README or LICENSE files for the template itself without those files being copied to the generated project. Also, post-generation commands written in the Makefile will be copied to the generated Makefile, even after they are no longer of use. Finally, the ad-hoc templating language doesn’t provide a way to escape its __lowercasevariables__ – though I can’t think of a language where that limitation would be a problem.

Define C++ function at runtime

I'm trying to adjust some mathematical code I've written to allow for arbitrary functions, but I only seem to be able to do so by pre-defining them at compile time, which seems clunky. I'm currently using function pointers, but as far as I can see the same problem would arise with functors. To provide a simplistic example, for forward-difference differentiation the code used is:
double xsquared(double x) {
return x*x;
}
double expx(double x) {
return exp(x);
}
double forward(double x, double h, double (*af)(double)) {
double answer = (af(x+h)-af(x))/h;
return answer;
}
Where either of the first two functions can be passed as the third argument. What I would like to do, however, is pass user input (in valid C++) rather than having to set up the functions beforehand. Any help would be greatly appreciated!

Historically the kind of functionality you're asking for has not been available in C++. The usual workaround is to embed an interpreter for a language other than C++ (Lua and Python for example are specifically designed for being integrated into C/C++ apps to allow scripting of them), or to create a new language specific to your application with your own parser, compiler, etc. However, that's changing.
Clang is a new open source compiler that's having its development by Apple that leverages LLVM. Clang is designed from the ground up to be usable not only as a compiler but also as a C++ library that you can embed into your applications. I haven't tried it myself, but you should be able to do what you want with Clang -- you'd link it as a library and ask it to compile code your users input into the application.
You might try checking out how the ClamAV team already did this, so that new virus definitions can be written in C.
As for other compilers, I know that GCC recently added support for plugins. It maybe possible to leverage that to bridge GCC and your app, but because GCC wasn't designed for being used as a library from the beginning it might be more difficult. I'm not aware of any other compilers that have a similar ability.

As C++ is a fully compiled language, you cannot really transform user input into code unless you write your own compiler or interpreter. But in this example, it can be possible to build a simple interpreter for a Domain Specific Language which would be mathematical formulae. All depends on what you want to do.

You could always take the user's input and run it through your compiler, then executing the resulting binary. This of course would have security risks as they could execute any arbitrary code.
Probably easier is to devise a minimalist language that lets users define simple functions, parsing them in C++ to execute the proper code.

The best solution is to use an embedded language like lua or python for this type of task. See e.g. Selecting An Embedded Language for suggestions.

You may use tiny C compiler as library (libtcc).
It allows you to compile arbitrary code in run-time and load it, but it is only works for C not C++.
Generally the only way is following:
Pass the code to compiler and create shared object or DLL
Load this Shared object or DLL
Use function from this shared object.

C++, unlike some other languages like Perl, isn't capable of doing runtime interpretation of itself.
Your only option here would be to allow the user to compile small shared libraries that could be dynamically-loaded by your application at runtime.

Well, there are two things you can do:
Take full advantage of boost/C++0x lambda's and to define functions at runtime.
If only mathematical formula's are needed, libraries like muParser are designed to turn a string into bytecode, which can be seen as defining a function at runtime.

While it seems like a blow off, there are a lot of people out there who have written equation parsers and interpreters for c++ and c, many commercial, many flawed, and all as different as faces in a crowd. One place to start is the college guys writing infix to postfix translators. Some of these systems use paranthetical grouping followed by putting the items on a stack like you would find in the old HP STL library. I spent 30 seconds and found this one:
http://www.speqmath.com/tutorials/expression_parser_cpp/index.html
possible search string:"gcc 'equation parser' infix to postfix"

Reading/Understanding third-party code

When you get a third-party library (c, c++), open-source (LGPL say), that does not have good documentation, what is the best way to go about understanding it to be able to integrate into your application?
The library usually has some example programs and I end up walking through the code using gdb. Any other suggestions/best-practicies?
For an example, I just picked one from sourceforge.net, but it's just a broad engineering/programming question:
http://sourceforge.net/projects/aftp/

I frequently use a couple of tools to help me with this:
GNU Global. It generates cross-referencing databases and can produce hyperlinked HTML from source code. Clicking function calls will take you to their definitions, and you can see lists of all references to a function. Only works for C and perhaps C++.
Doxygen. It generates documentation from Javadoc-style comments. If you tell it to generate documentation for undocumented methods, it will give you nice summaries. It can also produce hyperlinked source code listings (and can link into the listings provided by htags).
These two tools, along with just reading code in Emacs and doing some searches with recursive grep, are how I do most of my source reverse-engineering.

One of the better ways to understand it is to attempt to document it yourself. By going and trying to document it yourself, it forces you to really dive in and test and test and test and make sure you know what each statement is doing at what times. Then you can really start to understand what the previous developer may have been thinking (or not thinking for that matter).

Great question. I think that this should be addressed thoroughly, so I'm going to try to make my answer as thorough as possible.
One thing that I do when approaching large projects that I've either inherited or contributing to is automatically generate their sources, UML diagrams, and anything that can ease the various amounts of A.D.D. encountered when learning a new project:)
I believe someone here already mentioned Doxygen, that's a great tool! You should look into it and write a small bash script that will automatically generate sources for the application you're developing in some tree structure you've setup.
One thing that I've haven't seen people mention is BOUML! It's fantastic and free! It automatically generates reverse UML diagrams from existing sources and it supports a variety of languages. I use this as a way to really capture the big picture of what's going on in terms of architecture and design before I start reading code.
If you've got the money to spare, look into Understand for %language-here%. It's absolutely great and has helped me in many ways when inheriting legacy code.
EDIT:
Try out ack (betterthangrep.com), it is a pretty convenient script for searching source trees:)

Familiarize yourself with the information available in the headers. The functions you call will be declared there. Then try to identify the valid arguments and pre-/post-conditions of the functions, as those are your primary guidance (even if they are not documented!). The example programs are your next bet.

If you have code completion/intellisense I like opening up the library and going '.' or 'namespace::' and seeing what comes up. I always find it helpful, you can navigate through the objects/namespaces and see what functionality they have. This is of course assuming its an OOP library with relatively good naming of functions/objects.

There really isn't a silver bullet other than just rolling up your sleeves and digging into the code.
This is where we earn our money.

Three things;
(1) try to run the test or example apps available, set low debug levels, and walk through logs.
(2) use source navigator tool / cscope ( available both on windows and linux) and browse the code to understand the flow.
(3) also in parallel use gdb to walk into code while running test/example apps.

We Keep Coding

c++ django amazon-web-services regex python-2.7 google-cloud-platform list unit-testing opengl ember.js