In C++, expression templates is a technique that relies on the compiler's knowledge about expressions in C++ code to simplify them and optimize them beyond what would be possible in a procedural program. It's a powerful technique used by e.g. the Eigen and Armadillo matrix libraries to speed up certain compound operations on matrices. An incomplete wiki page on the Eigen web page almost starts explaining it.
I wonder if a similar technique exists in Rust, i.e. is there a way to make the Rust compiler optimize certain expressions at compile time so that the least amount of temporaries is created.
If I read Expression Templates right, then you can see them in action with Rust Iterators: methods such as filter, take, etc etc return an expression template, a struct which represents the computation but doesn't perform it until requested. This gives the optimization you require right away, no temporaries are created.
Using the where clause I imagine one can write specializations to further optimize certain combinations of computations.
Related
Does anyone know if there's standardized process for unit testing a new language.
I mean any new language will have basic flow control like IF, CASE etc.
How does one normally test the language itself?
Unit testing is one strategy to achieve a goal: verify that a piece of software meets a stated specification. Let's assume you are more interested in the goal, instead of exclusively using unit testing to achieve it.
The question of verifying that a language meets a specification or exhibits specific desirable qualities is profound. The earliest work led to type theory, in which one usually extends the language with new syntax and rules to allow one to talk about well-typed programs: programs that obey these new rules.
Hand-in-hand with these extensions are mathematical proofs demonstrating that any well-typed program will exhibit various desired qualities. For example, perhaps a well-typed program will never attempt to perform integer arithmetic on a string, or try to access an out-of-bounds element of an array.
By requiring programs to be well-typed before allowing them to execute, one can effectively extend these guarantees from well-typed programs to the language itself.
Type systems can be classified by the kinds of rules they include, which in turn determine their expressive power. For example, most typed languages in common use can verify my first case above, but not the second. With the added power comes greater complexity: their type verification algorithms are correspondingly harder to write, reason about, etc.
If you want to learn more, I suggest you read this book, which will take you from the foundations of functional programming up through the common type system families.
You could lookup what other languages do for testing. When I was developing a language I was thinking about doing something like Python. They have tests written in python itself.
You could lookup their tests. These are some of then: grammar, types, exceptions and so on.
Offcourse, there is a lot of useful stuff there if you are looking for examples, so I recomend that you dig in :).
Supposing we have two grammars which define the same languge: regular one and LALR(1) one.
Both regular and LALR(1) algorithms are O(n) where n is input length.
Regexps are usually preferred for parsing regular languages. Why? Is there a formal proof (or maybe that's obvious) that they are faster?
You should prefer stackless automaton over pushdown one as there is much more developed maths for regular language automatons.
We are able to perform determinization for both types of automaton, but we are unable to perform efficient minimization of PDA. The well known fact is that for every PDA there exists equivalent one with the only state. This means that we should minimize it with respect to transitions count/max stack depth/some other criteria.
Also the problem of checking whether two different PDAs are equivalent with respect to the language they recognize is undecidable.
There is a big difference between parsing and recognizing. Although you could build a regular-language parser, it would be extremely limited, since most useful languages are not parseable with a useful unambiguous regular grammar. However, most (if not all) regular expression libraries recognize, possibly with the addition of a finite number of "captures".
In any event, parsing really isn't the performance bottleneck anymore. IMHO, it's much better to use tools which demonstrably parse the language they appear to parse.
On the other hand, if all you want to do is recognize a language -- and the language happens to be regular -- regular expressions are a lot easier and require much less infrastructure (parser generators, special-purpose DSLs, slightly more complicated Makefiles, etc.)
(As an example of a language feature which is not regular, I give you: parentheses.)
People prefer regular expressions because they're easier to write. If your language is a regular language, why bother creating a CFG grammer for it?
I am working with std::regex, and whilst reading about the various constants defined in std::regex_constants, I came across std::optimize, reading about it, it sounds like it is useful in my application (I only need one instance of the regex, initialized at the beginning, but it is used multiple times throughout the loading process).
According to the working paper n3126 (pg. 1077), std::regex_constants::optimize:
Specifies that the regular expression engine should pay more attention to the speed with which regular expressions are matched, and less to the speed with which regular expression objects are constructed. Otherwise it has no detectable effect on the program output.
I was curious as to what type of optimization would be performed, but there doesn't seem to be much literature about it (indeed, it seems to be undefined), and one of the only things I found was at cppreference.com, which stated that std::regex_constants::optimize:
Instructs the regular expression engine to make matching faster, with the potential cost of making construction slower. For example, this might mean converting a non-deterministic FSA to a deterministic FSA.
However, I have no formal background in computer science, and whilst I'm aware of the basics of what an FSA is, and understand the basic difference between a deterministic FSA (each state only has one possible next state), and a non-deterministic FSA (with multiple potential next states); I do not understand how this improves matching time. Also, I would be interested to know if there are any other optimizations in various C++ Standard Library implementations.
There's some useful information on the topic of regex engines and performance trade offs (far more than can fit in a stackoverflow answer) in Mastering Regular Expressions by Jeffrey Friedl.
It's worth noting that Boost.Regex, which was the source for N3126, documents optimize as "This currently has no effect for Boost.Regex."
P.S.
indeed, it seems to be implementation-defined
No, it's unspecified. Implementation-defined means an implementation is required to define the choice of behaviour. Implementations are not required to document how their regex engines are implemented or what (if any) difference the optimize flag makes.
P.S. 2
in various STL implementations
std::regex is not part of the STL, the C++ Standard Library is not the same thing as the STL.
See http://swtch.com/~rsc/regexp/regexp1.html for a nice explanation on how NFA based regex implementations can avoid the exponential backtracking that occurs in DFA matchers in certain circumstances.
My understanding is Clojure's homoiconicity exists so as to make writing macros easier.
Based on this stackoverflow thread, it looks like Macros are used sparingly, except for DSLs in which higher-order functions are not to be used.
Could someone share some examples of how macros are used in real-life?
It's correct that homoiconicity makes writing Clojure macros very easy. Basically they enable you to write code that builds whatever code you want, exploiting the "code is data" philosophy of Lisp.
Macros in a homoiconic language are also extremely powerful. There's a fun example presentation I found where they implement a LINQ-like query syntax in just three lines of Clojure.
In general, Clojure macros are potentially useful for many reasons:
Control structures - it's possible to create certain control structures using macros that can never be represented as functions. For example, you can't write if as a function, because if it was a function then it would have to evaluate all three arguments, whereas with a macro you can make it only evaluate two (the condition value and either the true or false expression)
Compile time optimisation - sometimes you want to optimise your code based on a constant that is known or can be computed at compile time. For example, you could create a "logging" function that logs only if the code was compiled in debug mode, but creates zero overhead in the production application.
Code generation / boilerplate elimination - if you need to produce a lot of very similar code with similar structure, then you can use macros to automatically generate these from a few parameters. If you hate boilerplate, then macros are your friends.
Creating new syntax - if you see the need for a particular piece of syntax that would be useful (perhaps encapsulating a common pattern) then you can create a macro to implement this. Some DSLs for example can be simplified with additional syntax.
Creating a new language with entirely new semantics (Credits to SK-Logic!) theoretically you could even go so far to create a new language using macros, which would effectively compile your new language down into Clojure. The new langauge would not even have to be Lisp-like: it could parse and compile arbitrary strings for example.
One important piece of advice is only use macros if you need them and functions won't work. Most problems can be solved with functions. Apart for the fact that macros are overkill for simple cases, functions have some intrinsic advantages: they are more flexible, can be stored in data structures, can be passed as parameters to higher order functions, are a bit easier for people to understand etc.
I've written a handful of basic 2D shooter games, and they work great, as far as they go. To build upon my programming knowledge, I've decided that I would like to extend my game using a simple scripting language to control some objects. The purpose is more about the general process of design of writing a script parser / executer than the actual control of random objects.
So, my current line of thought is to make use of a container of lambda expressions (probably a map). As the parser reads each line, it will determine the type of expression. Then, once it has decided the type of instruction and discovered whatever values it has to work with, it will then open the map to the kind of expression and pass it any values it needs to work.
A more-or-less pseudo code example would be like this:
//We have determined somehow or another that this is an assignment operator
someContainerOfFunctions["assignment"](whatever_variable_we_want);
So, what do you guys think of a design like this?
Not to discourage you, but I think you would get more out of embedding something like Squirrel or Lua into your project and learning to use the API and the language itself. The upside of this is that you'll have good performance without having to think about the implementation.
Implementing scripting languages (even basic ones) from scratch is quite a task, especially when you haven't done one before.
To be honest: I don't think it's a good idea as you described, but does have potential.
This limits you with an 'annoying' burden of C++'s static number of arguments, which is may or may not what you want in your language.
Imagine this - you want to represent a function:
VM::allFunctions["functionName"](variable1);
But that function takes two arguments! How do we define a dynamic-args function? With "..." - that means stdargs.h and va_list. unfortunately, va_list has disadvantages - you have to supply an extra variable that will somehow be of an information to you of how many variables are there, so we change our fictional function call to:
VM::allFunctions["functionName"](1, variable1);
VM::allFunctions["functionWithtwoArgs"](2, variable1, variable2);
That brings you to a new problem - During runtime, there is no way to pass multiple arguments! so we will have to combine those arguments into something that can be defined and used during runtime, let's define it (hypothetically) as
typedef std::vector<Variable* > VariableList;
And our call is now:
VM::allFunctions["functionName"](varList);
VM::allFunctions["functionWithtwoArgs"](varList);
Now we get into 'scopes' - You cannot 'execute' a function without a scope - especially in embedded scripting languages where you can have several virtual machines (sandboxing, etc...), so we'll have to have a Scope type, and that changes the hypothetical call to:
currentVM->allFunctions["functionName_twoArgs"].call(varList, currentVM->currentScope);
I could continue on and on, but I think you get the point of my answer - C++ doesn't like dynamic languages, and it would most likely not change to fit it, as it will most likely change the ABI as well.
Hopefully this will take you to the right direction.
You might find value in Greg Rosenblatt's series of articles of at GameDev.net on creating a scripting engine in C++ ( http://www.gamedev.net/reference/articles/article1633.asp ).
The approach he takes seems to err on the side of minimalism and thus may be either a close fit or a good source of implementation ideas.