Assembling a function as needed and computing it fast - c++

There are interpreted languages out there, such as Lisp, Tcl, Perl, etc., that make it easy to define a lambda/proc/sub within your code during runtime and to evaluate it within the same session.
There are compiled languages out there, such as C++, that would execute much faster than the interpreted ones, yet defining a function within a compiled program during runtime and executing it is not easy, if at all possible.
The problem here is to do the following:
Define a function during runtime: for example, based on the initial input data derive an analytic model of the data.
Execute the above function fast in a loop: for example, apply the derived analytic model for analysing incoming data.
One solution that I saw was not very pretty:
A procedure representing the analytic model was derived in embedded Tcl based on the initial input data.
A lookup table was created by evaluating the procedure in Tcl on an array of sample points that, optimistically speaking, would cover the applicability range.
The lookup table was passed from the Tcl interpreter back to the binary (which was developed in C++).
Then the incoming data was analysed by interpolating between "close enough" values in the lookup table.
The above solution works, but has quite a few problems, both conceptual and computational. Thus the question: is it possible to define a function purely within C++ and make it available for execution within the same runtime session?
Conceptually speaking, is it possible to do something like create a function as a string, compile it in-memory, and somehow link it back into the binary that's being executed?

If you want something working right out of the box have a look at ExprTK. If you want to write an expression parser yourself check out Boost Spirit.
An alternative would be to create C++ code on the fly, compile it as a shared library (plugin) and load it at runtime. This would probably be the fastest solution.

Related

A simple program, or - reflection, serialization, type erasure etc?

I need to do some kind of profiling (let's not go into the details) for a certain function, on various inputs and in various external conditions. The function is "pure", i.e. has no (intentional) side-effects, doesn't use globals etc; also, let's assume for simplicity of this question that its parameters are simple; say, C-like fundamental types, plain structs or restricted pointers.
So, I've been thinking I need is to write me a small program which:
Determines, using the command line, what the function's inputs should be; and where the files containing longer inputs are. (Let's assume long inputs' file format is a straight up binary dump from memory, no fancy textual format.)
Read the relevant files into memory.
Invoke the function with the appropriate arguments.
Write the outputs to files, or check them against expected outputs I've also read from files.
This is not hard of course, but I feel like I'm reinventing the wheel:
I'm essentially "sublimating" the act of invoking a function. I'm sure in some higher-level languages this whole affair is a one-liner.
I would be doing de-serialization and serialization of a function's parameters. Surely some libraries do that?
If I don't hard-code the function I call, I would also have to somehow encode parameter types, or alternatively pass them in a type-erased fashion, and other such reflection bread-and-butter.
I would expect unit-test frameworks to offer this kind of functionality. I mean, what I've described is a kind of a unit test of an (almost) arbitrary function, with inputs coming from disk and from the command-line instead of where they would normally come from in a larger system.
My question: Should I just not worry about it, and go ahead and write this program? Or should I, instead, be more of a "smart-ass" and try to rummage through existing libraries for pieces of what I'm trying to do?
Note: I'm specifically limited to C++14, but your answer can make any assumption you like about the language standard version.

What language is used in this slide from Google Next

I was watching a Google Next session, as I'm interested in Google's cloud and their Go language.
Developer ecosystems/communities have their ways of doing things, cultural customs, which can be really alien to outsiders who don't have the experience to fill-in the gaps.
So I have a few noob questions:
What language is this?
What language does Google use in samples, Python, Go, or pseudo code?
Why is there a call to getFailedInserts() but the result of the get isn't assigned to anything?
Is it normal to use what I call magic strings, i.e. "WriteMutatedRecords", as instructions instead of naming a method as such or using an enum, or string consts?
The code example is Java using Apache Beam programming model (https://beam.apache.org/)
I believe the complete code from the slide is here:
https://github.com/ryanmcdowell/dataflow-dynamic-schema/blob/master/src/main/java/com/google/cloud/pso/pipeline/DynamicSchemaPipeline.java
The code from slide:
tries to insert data into a table 'events_table'
If it returns a transient error from Big Query API (for example "column 'foo' does not exist") it runs a table mutation adding 'foo' and inserts data again.
It is a pattern to create flexible tables into Big Query which is a predefined schema columnar database.
The code example looks like it is written in Scala or Java. You can tell from a number of indicators:
The code has a Java-style syntax
Methods are called on objects (e.g. input), which means it is an object-oriented language
new BigQuerySchemaMutator() is typical for a Java - style constructor
These indicators do not, however, give any indication wether it is Scala or Java. The syntax of these languages is very similar, and both are JVM - lanugages.
The strongest indicator for Scala in my opinion is that the code is written in a functional matter, and it contains two method invocations on BigQueryIO, which could either be a static method for the class BigQueryIO itself in case of Java, or is a method defined on the object BigQueryIO in Scala, which is a common design pattern in the language.
There is, however, the final ; which would only be necessary with Java.
For someone reading the code example this question is actually not important, because Apache Beam (which is the SDK that seems to be used here) is a Java library - which can be used both in Java and Scala.
The result of getFailedInserts seems to be further processed by calling .apply on it. This kind of style is called functional programming.
It's a whole different approach to programming, instead of the common procedural programming patterns found in most other lanugages. (e.g. storing something in a variable / variables in general)
Note that this example doesn't actually contain any functional programming per se (e.g. higher order functions alias lambdas), but the functional programming style is obvious.
It is always considered best practice to not have magic strings, but for such a code example they probably wanted to keep the code as simple as possible - as it is a one-liner already (allthough with line breaks).

How can I decrease complexity in library without increasing complexity elsewhere?

I am tasked to maintain and update a library which allows a computer to send commands at a hardware device and then receive its response. Currently the code is setup in such a way that every single possible command the device can receive is sent via its own function. Code repetition is everywhere; a DRY advocate's worst nightmare.
Obviously there is much opportunity for improvement. The problem is each command has a different payload. Currently the data that is to be the payload is passed to each command function in the form of arguments. It's difficult to consolidate functionality without pushing the complexity to a level that calls the library.
When a response is received from the device its data is put into an object of a class solely responsible for holding this data, they do nothing else. There are hundreds of classes which do this. These objects are then used to access the returned data by the app layer.
My objectives:
Throughly reduce code repetition
Maintain similiar level of complexity at application layer
Make it easier to add new commands
My idea:
Have one function to send a command and one to receive (the receiving function is automatically called when a response from the device is detected). Have a struct holding all command/response data which will be passed to sending function and returned by receiving function. Since each command has a corresponding enum value, have a switch statement which sets up any command specific data for sending.
Is my idea the best way to do it? Is there a design pattern I could use here? I've looked and looked but nothing seems to fit my needs.
Thanks in advance! (Please let me know if clarification is necessary)
This reminds me of the REST vs. SOA debate, albeit on a smaller physical scale.
If I understand you correctly, right now you have calls like
device->DoThing();
device->DoOtherThing();
and then sometimes I get a callback like
callback->DoneThing(ThingResult&);
callback->DoneOtherTHing(OtherThingResult&)
I suggest that the user is the key component here. Do the current library users like the interface at the level it is designed? Is the interface consistent, even if it is large?
You seem to want to propose
device->Do(ThingAndOtherThingParameters&)
callback->Done(ThingAndOtherThingResult&)
so to have a single entry point with more complex data.
The downside from a library user perspective may that now I have to use a manual switch() or other type statement to tell what really happened. While the dispatching to the appropriate result callback used to be done for me, now you have made it a burden upon the library user.
Unless this bought me as a user some level of flexibility, that I as as user wanted I would consider this a step backwards.
For your part as an implementor, one suggestion would be to go to the generic form internally, and then offer both interfaces externally. Perhaps the old specific interface could even be auto-generated somehow.
Good Luck.
Well, your question implies that there is a balance between the library's complexity and the client's. When those are the only two choices, one almost always goes with making the client's life easier. However, those are rarely really the only two choices.
Now in the text you talk about a command processing architecture where each command has a different set of data associated with it. In the olden days, this would typically be implemented with a big honking case statement in a loop, where each case called a different routine with different parameters and perhaps some setup code. Grisly. McCabe complexity analysers hate this.
These days what you can do with an OO language is use dynamic dispatch. Create a base abstract "command" class with a standard "handle()" method, and have each different command inherit from it to add their own members (to represent the different "arguments" to the different commands). Then you create a big honking array of these at startup, usually indexed by the command ID. For languages like C++ or Ada it has to be an array of pointers to "command" objects, for the dynamic dispatch to work. Then you can just call the appropriate command object for the command ID you read from the client. The big honking case statement is now handled implicitly by the dynamic dispatch.
Where you can get the big savings in this scenario is in subclassing. Do you have several commands that use the exact same parameters? Make a subclass for them, and then derive all of those commands from that subclass. Do you have several commands that have to perform the same operation on one of the parameters? Make a subclass for them with that one method implemented for that operation, and then derive all those commands from that subclass.
Your first objective should be to produce a library that decouples higher software layers from the hardware. Users of your library shouldn't care that you have a hardware device that can execute a number of functions with a different payload. They should only care what the device does in a higher level. In this sense, it is in my opinion a good thing that every command is mapped to each one function.
My plan will be:
Identify the objects the higher data layers need to get the job done. Model the objects in C++ classes from their perspective, not from the perspective of the hardware
Define the interface of the library using the above objects
Start the implementation of the library. Perhaps an intermediate layer that maps software objects to hardware objects is necessary
There are many things you can do to reduce code repetition. You can use polymorphism. Define a class with the base functionality and extend it. You can also use utility classes, that implement functions needed for many commands.

How should I go about building a simple LR parser?

I am trying to build a simple LR parser for a type of template (configuration) file that will be used to generate some other files. I've read and read about LR parsers, but I just can't seem to understand it! I understand that there is a parse stack, a state stack and a parsing table. Tokens are read onto the parse stack, and when a rule is matched then the tokens are shifted or reduced, depending on the parsing table. This continues recursively until all of the tokens are reduced and the parsing is then complete.
The problem is I don't really know how to generate the parsing table. I've read quite a few descriptions, but the language is technical and I just don't understand it. Can anyone tell me how I would go about this?
Also, how would I store things like the rules of my grammar?
http://codepad.org/oRjnKacH is a sample of the file I'm trying to parse with my attempt at a grammar for its language.
I've never done this before, so I'm just looking for some advice, thanks.
In your study of parser theory, you seem to have missed a much more practical fact: virtually nobody ever even considers hand writing a table-driven, bottom-up parser like you're discussing. For most practical purposes, hand-written parsers use a top-down (usually recursive descent) structure.
The primary reason for using a table-driven parser is that it lets you write a (fairly) small amount of code that manipulates the table and such, that's almost completely generic (i.e. it works for any parser). Then you encode everything about a specific grammar into a form that's easy for a computer to manipulate (i.e. some tables).
Obviously, it would be entirely possible to do that by hand if you really wanted to, but there's almost never a real point. Generating the tables entirely by hand would be pretty excruciating all by itself.
For example, you normally start by constructing an NFA, which is a large table -- normally, one row for each parser state, and one column for each possible input. At each cell, you encode the next state to enter when you start in that state, and then receive that input. Most of these transitions are basically empty (i.e. they just say that input isn't allowed when you're in that state). Note: since the valid transitions are so sparse, most parser generators support some way of compressing these tables, but that doesn't change the basic idea).
You then step through all of those and follow some fairly simple rules to collect sets of NFA states together to become a state in the DFA. The rules are simple enough that it's pretty easy to program them into a computer, but you have to repeat them for every cell in the NFA table, and do essentially perfect book-keeping to produce a DFA that works correctly.
A computer can and will do that quite nicely -- for it, applying a couple of simple rules to every one of twenty thousand cells in the NFA state table is a piece of cake. It's hard to imagine subjecting a person to doing the same though -- I'm pretty sure under UN guidelines, that would be illegal torture.
The classic solution is the lex/yacc combo:
http://dinosaur.compilertools.net/yacc/index.html
Or, as gnu calls them - flex/bison.
edit:
Perl has Parse::RecDescent, which is a recursive descent parser, but it may work better for simple work.
you need to read about ANTLR
I looked at the definition of your fileformat, while I am missing some of the context why you would want specifically a LR parser, my first thought was why not use existing formats like xml, or json. Going down the parsergenerator route usually has a high startup cost that will not pay off for the simple data that you are looking to parse.
As paul said lex/yacc are an option, you might also want to have a look at Boost::Spirit.
I have worked with neither, a year ago wrote a much larger parser using QLALR by the Qt/Nokia people. When I researched parsers this one even though very underdocumented had the smallest footprint to get started (only 1 tool) but it does not support lexical analysis. IIRC I could not figure out C++ support in ANTLR at that time.
10,000 mile view: In general you are looking at two components a lexer that takes the input symbols and turns them into higher order tokens. To work of the tokens your grammar description will state rules, usually you will include some code with the rules, this code will be executed when the rule is matched. The compiler generator (e.g. yacc) will take your description of the rules and the code and turn it into compilable code. Unless you are doing this by hand you would not be manipulating the tables yourself.
Well you can't understand it like
"Function A1 does f to object B, then function A2 does g to D etc"
its more like
"Function A does action {a,b,c,d,e,f,g,h,i,j,k,l,m,n,o or p, or no-op} and shifts/reduces a certain count to objects {1-1567} at stack head of type {B,C,D,E,F,or G} and its containing objects up N levels which may have types {H,I,J,K or L etc} in certain combinations according to a rule list"
It really does need a data table (or code generated from a data table like thing, like a set of BNF grammar data) telling the function what to do.
You CAN write it from scratch. You can also paint walls with eyelash brushes. You can interpret the data table at run-time. You can also put Sleep(1000); statements in your code every other line. Not that I've tried either.
Compilers is complex. Hence compiler generators.
EDIT
You are attempting to define the tokens in terms of content in the file itself.
I assume the reason you "don't want to use regexes" is that you want to be able to access line number information for different tokens within a block of text and not just for the block of text as a whole. If line numbers for each word are unnecessary, and entire blocks are going to fit into memory, I'd be inclined to model the entire bracketed block as a token, as this may increase processing speed. Either way you'll need a custom yylex function. Start by generating one with lex with fixed markers "[" and "]" for content start and end, then freeze it and modify it to take updated data about what markers to look for from the yacc code.

Lambda Expressions and Script Parsing -- Is this a good design idea?

I've written a handful of basic 2D shooter games, and they work great, as far as they go. To build upon my programming knowledge, I've decided that I would like to extend my game using a simple scripting language to control some objects. The purpose is more about the general process of design of writing a script parser / executer than the actual control of random objects.
So, my current line of thought is to make use of a container of lambda expressions (probably a map). As the parser reads each line, it will determine the type of expression. Then, once it has decided the type of instruction and discovered whatever values it has to work with, it will then open the map to the kind of expression and pass it any values it needs to work.
A more-or-less pseudo code example would be like this:
//We have determined somehow or another that this is an assignment operator
someContainerOfFunctions["assignment"](whatever_variable_we_want);
So, what do you guys think of a design like this?
Not to discourage you, but I think you would get more out of embedding something like Squirrel or Lua into your project and learning to use the API and the language itself. The upside of this is that you'll have good performance without having to think about the implementation.
Implementing scripting languages (even basic ones) from scratch is quite a task, especially when you haven't done one before.
To be honest: I don't think it's a good idea as you described, but does have potential.
This limits you with an 'annoying' burden of C++'s static number of arguments, which is may or may not what you want in your language.
Imagine this - you want to represent a function:
VM::allFunctions["functionName"](variable1);
But that function takes two arguments! How do we define a dynamic-args function? With "..." - that means stdargs.h and va_list. unfortunately, va_list has disadvantages - you have to supply an extra variable that will somehow be of an information to you of how many variables are there, so we change our fictional function call to:
VM::allFunctions["functionName"](1, variable1);
VM::allFunctions["functionWithtwoArgs"](2, variable1, variable2);
That brings you to a new problem - During runtime, there is no way to pass multiple arguments! so we will have to combine those arguments into something that can be defined and used during runtime, let's define it (hypothetically) as
typedef std::vector<Variable* > VariableList;
And our call is now:
VM::allFunctions["functionName"](varList);
VM::allFunctions["functionWithtwoArgs"](varList);
Now we get into 'scopes' - You cannot 'execute' a function without a scope - especially in embedded scripting languages where you can have several virtual machines (sandboxing, etc...), so we'll have to have a Scope type, and that changes the hypothetical call to:
currentVM->allFunctions["functionName_twoArgs"].call(varList, currentVM->currentScope);
I could continue on and on, but I think you get the point of my answer - C++ doesn't like dynamic languages, and it would most likely not change to fit it, as it will most likely change the ABI as well.
Hopefully this will take you to the right direction.
You might find value in Greg Rosenblatt's series of articles of at GameDev.net on creating a scripting engine in C++ ( http://www.gamedev.net/reference/articles/article1633.asp ).
The approach he takes seems to err on the side of minimalism and thus may be either a close fit or a good source of implementation ideas.