Does anyone know if there's standardized process for unit testing a new language.
I mean any new language will have basic flow control like IF, CASE etc.
How does one normally test the language itself?
Unit testing is one strategy to achieve a goal: verify that a piece of software meets a stated specification. Let's assume you are more interested in the goal, instead of exclusively using unit testing to achieve it.
The question of verifying that a language meets a specification or exhibits specific desirable qualities is profound. The earliest work led to type theory, in which one usually extends the language with new syntax and rules to allow one to talk about well-typed programs: programs that obey these new rules.
Hand-in-hand with these extensions are mathematical proofs demonstrating that any well-typed program will exhibit various desired qualities. For example, perhaps a well-typed program will never attempt to perform integer arithmetic on a string, or try to access an out-of-bounds element of an array.
By requiring programs to be well-typed before allowing them to execute, one can effectively extend these guarantees from well-typed programs to the language itself.
Type systems can be classified by the kinds of rules they include, which in turn determine their expressive power. For example, most typed languages in common use can verify my first case above, but not the second. With the added power comes greater complexity: their type verification algorithms are correspondingly harder to write, reason about, etc.
If you want to learn more, I suggest you read this book, which will take you from the foundations of functional programming up through the common type system families.
You could lookup what other languages do for testing. When I was developing a language I was thinking about doing something like Python. They have tests written in python itself.
You could lookup their tests. These are some of then: grammar, types, exceptions and so on.
Offcourse, there is a lot of useful stuff there if you are looking for examples, so I recomend that you dig in :).
Supposing we have two grammars which define the same languge: regular one and LALR(1) one.
Both regular and LALR(1) algorithms are O(n) where n is input length.
Regexps are usually preferred for parsing regular languages. Why? Is there a formal proof (or maybe that's obvious) that they are faster?
You should prefer stackless automaton over pushdown one as there is much more developed maths for regular language automatons.
We are able to perform determinization for both types of automaton, but we are unable to perform efficient minimization of PDA. The well known fact is that for every PDA there exists equivalent one with the only state. This means that we should minimize it with respect to transitions count/max stack depth/some other criteria.
Also the problem of checking whether two different PDAs are equivalent with respect to the language they recognize is undecidable.
There is a big difference between parsing and recognizing. Although you could build a regular-language parser, it would be extremely limited, since most useful languages are not parseable with a useful unambiguous regular grammar. However, most (if not all) regular expression libraries recognize, possibly with the addition of a finite number of "captures".
In any event, parsing really isn't the performance bottleneck anymore. IMHO, it's much better to use tools which demonstrably parse the language they appear to parse.
On the other hand, if all you want to do is recognize a language -- and the language happens to be regular -- regular expressions are a lot easier and require much less infrastructure (parser generators, special-purpose DSLs, slightly more complicated Makefiles, etc.)
(As an example of a language feature which is not regular, I give you: parentheses.)
People prefer regular expressions because they're easier to write. If your language is a regular language, why bother creating a CFG grammer for it?
I'm looking for examples of usages in OCaml to demonstrate simple properties or theorems.
An example may be, given an ocaml definition of binary trees, demonstrate that the maximum number of nodes is 2^(h+1)-1.
I have founds such kind of examples for binary trees and graphs but nothing else.. any suggestion or links?
If you are speaking of proofs written on paper, the usages are essentially the same that with other languages: an informal reasoning based on a reasonable (but not formalized) model of the semantics of the program. To handle your case I would write two functions size and height, and prove by inductive reasoning on the tree that size h <= pow 2 (height h + 1) - 1, using an induction hypothesis on the two subtrees -- I can make this explanation more detailed but prefer to let you do it yourself if you wish.
If you want more formal proofs, there are several approaches.
Proof techniques based on hoare logics have been adapted to functional programming languages. See for example the 2008 work of Régis-Gianas and Pottier, A Hoare logic for call-by-value functional programs. They provide a formal basis for what can be used, still in hand-written proofs, to give a more rigorous (because down-to-the-metal) proof of your claim. It could also be used in a theorem prover but I'm not sure this approach has been fully worked out yet.
Another natural approach would be to write your program directly in the Coq proof assistant, whose programming language is mostly a purely functional subset of OCaml, and use its facilities for proving. This is not exactly like writing in OCaml, but quite close; then you can either mirror the implementation in OCaml directly, or use the extraction facility of Coq to get honest-looking OCaml code that has been "compiled" from the Coq program. This approach has been used to formalize the implementation of the balanced binary trees present in the OCaml standard library, and the two implementations (the OCaml one and the Coq one) are sufficiently synchronized that you can transfer results to prove some OCaml-side changes correct.
In the same vein, there are some attempts to design languages for certified programming that may be more convenient, on some domains, than a general theorem prover such as Coq. Why3 is such a "software verification platform": it defines a programming languages (not very far from OCaml) and a specification language on top of it. You can formulate assertions about your program and verify them using a variety of techniques such as general proof assistants (eg. Coq) or more automated theorem provers (SMT solvers). Why3 strives to support verification of classic imperative-style algorithm implementations, but also support a functional programming style, so it may be an interesting choice for experiments with certified programming if you don't want to go to full-Coq (for example if you're not interested in ensuring termination of your algorithms, which can be inconvenient in Coq).
Finally, there has been work on the following technique: read your OCaml program and automatically produce a "Coq description" out of it, that you can properties about with the guarantee that what you prove correct also holds in the OCaml implementation. This has been the main result of Arthur Charguéraud's 2010 PhD thesis, where the "Coq description" is based on the technique of "Characteristic fomrulae". He has been able to prove correct ML implementations of relatively sophisticated algorithms such as Union-Find or examples out of the Chris Okasaki's excellent "Purely Functional Data Structures" book.
(I frequently mention the Coq proof assistant; other tools such as Isabelle and Agda are equally suitable, but Coq is closer in syntax to the OCaml language, so is probably a good choice if you want to re-implement ML programs to prove them formally correct.)
This is from the Wikipedia article on Automatic parallelization
Automatic parallelization by compilers or tools is very difficult due to the
following reasons[2]:
dependence analysis is hard for code using indirect addressing, pointers,
recursion, and indirect function calls;
loops have an unknown number of iterations;
accesses to global resources are difficult to coordinate in terms
of memory allocation, I/O, and shared variables.
As you can see, the problem with the 1st point has mostly to do with programming language. In C/C++ you have all the problems mentioned in point 1. So my question is, do we have a language which is close to C/C++ but without these problems. I know Fortran fills the bill, but it is not even remotely like C/C++.
FORTRAN has a few unique features (probably best described as lack of features) which allow for clean, automatic parallelization; however, with all of the recent additions to FORTRAN, it isn't clear if the modern equivalents of the language maintain such a distinction.
As far as comparison to C and C++, FORTRAN is not too far off the beaten path in terms of logic structures; however, to do sophisticated data structures, FORTRAN's roots tended to store a structure across several associated arrays, and then use particular fields as the "next" index. This avoided pointers, but offered similar flexibility.
I've seen FORTRAN implementations of hash tables, trees and other structures back when I worked with the language more; but, it definately wasn't written to easily represent such, and I'd imagine that such code is hard to parallelize.
There are several standards that introduce syntax for parallelization in C and C++.
One example is Unified Parallel C, another is openCL, through both have vastly different purposes.
The one that probably best fits your bill is OpenMP. It extends the C and C++ language so you can tell the compiler to parallelize your code (and where to do it).
Functional programming languages (e.g. Lisp, Haskell and F#) are by their very nature highly and automatically parallelizable, but are also vastly different from procedural and OOP languages like C and C++
If your looking for parallelization on a GPU
there is Microsoft's AMP
The PGI compiler and in the future more compilers will support the openmp like Open ACC
Caps has also released another openMP like compiler called HMPP
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
It is said that Blitz++ provides near-Fortran performance.
Does Fortran actually tend to be faster than regular C++ for equivalent tasks?
What about other HL languages of exceptional runtime performance? I've heard of a few languages suprassing C++ for certain tasks... Objective Caml, Java, D...
I guess GC can make much code faster, because it removes the need for excessive copying around the stack? (assuming the code is not written for performance)
I am asking out of curiosity -- I always assumed C++ is pretty much unbeatable barring expert ASM coding.
Fortran is faster and almost always better than C++ for purely numerical code. There are many reasons why Fortran is faster. It is the oldest compiled language (a lot of knowledge in optimizing compilers). It is still THE language for numerical computations, so many compiler vendors make a living of selling optimized compilers. There are also other, more technical reasons. Fortran (well, at least Fortran77) does not have pointers, and thus, does not have the aliasing problems, which plague the C/C++ languages in that domain. Many high performance libraries are still coded in Fortran, with a long (> 30 years) history. Neither C or C++ have any good array constructs (C is too low level, C++ has as many array libraries as compilers on the planet, which are all incompatible with each other, thus preventing a pool of well tested, fast code).
Whether fortran is faster than c++ is a matter of discussion. Some say yes, some say no; I won't go into that. It depends on the compiler, the architecture you're running it on, the implementation of the algorithm ... etc.
Where fortran does have a big advantage over C is the time it takes you to implement those algorithms. And that makes it extremely well suited for any kind of numerical computing. I'll state just a few obvious advantages over C:
1-based array indexing (tremendously helpful when implementing larger models, and you don't have to think about it, but just FORmula TRANslate
has a power operator (**) (God, whose idea was that a power function will do ? Instead of an operator?!)
it has, I'd say the best support for multidimensional arrays of all the languages in the current market (and it doesn't seem that's gonna change so soon) - A(1,2) just like in math
not to mention avoiding the loops - A=B*C multiplies the arrays (almost like matlab syntax with compiled speed)
it has parallelism features built into the language (check the new standard on this one)
very easily connectible with languages like C, python, so you can make your heavy duty calculations in fortran, while .. whatever ... in the language of your choice, if you feel so inclined
completely backward compatible (since whole F77 is a subset of F90) so you have whole century of coding at your disposal
very very portable (this might not work for some compiler extensions, but in general it works like a charm)
problem oriented solving community (since fortran users are usually not cs, but math, phy, engineers ... people with no programming, but rather problem solving experience whose knowledge about your problem can be very helpful)
Can't think of anything else off the top of my head right now, so this will have to do.
What Blitz++ is competing against is not so much the Fortran language, but the man-centuries of work going into Fortran math libraries. To some extent the language helps: an older language has had a lot more time to get optimizing compilers (and , let's face it, C++ is one of the most complex languages). On the other hand, high level C++ libraries like Blitz++ and uBLAS allows you to state your intentions more clearly than relatively low-level Fortran code, and allows for whole new classes of compile-time optimizations.
However, using any library effectively all the time requires developers to be well acquainted with the language, the library and the mathematics. You can usually get faster code by improving any one of the three...
FORTAN is typically faster than C++ for array processing because of the different ways the languages implement arrays - FORTRAN doesn't allow aliasing of array elements, whereas C++ does. This makes the FORTRAN compilers job easier. Also, FORTRAN has many very mature mathematical libraries which have been worked on for nearly 50 years - C++ has not been around that long!
This will depend a lot on the compiler, programmers, whether it has gc and can vary too much. If it is compiled directly to machine code then expect to have better performance than interpreted most of the time but there is a finite amount of optimization possible before you have asm speed anyway.
If someone said fortran was slightly faster would you code a new project in that anyway?
the thing with c++ is that it is very close to the hardware level. In fact, you can program at the hardware level (via assembly blocks). In general, c++ compilers do a pretty good job at optimisations (for a huge speed boost, enable "Link Time Code Generation" to allow the inlining of functions between different cpp files), but if you know the hardware and have the know-how, you can write a few functions in assembly that work even faster (though sometimes, you just can't beat the compiler).
You can also implement you're own memory managers (which is something a lot of other high level languages don't allow), thus you can customize them for your specific task (maybe most allocations will be 32 bytes or less, then you can just have a giant list of 32-byte buffers that you can allocate/deallocate in O(1) time). I believe that c++ CAN beat any other language, as long as you fully understand the compiler and the hardware that you are using. The majority of it comes down to what algorithms you use more than anything else.
You must be using some odd managed XML parser as you load this page then. :)
We continously profile code and the gain is consistently (and this is not naive C++, it is just modern C++ with boos). It consistensly paves any CLR implementation by at least 2x and often by 5x or more. A bit better than Java days when it was around 20x times faster but you can still find good instances and simply eliminate all the System.Object bloat and clearly beat it to a pulp.
One thing managed devs don't get is that the hardware architecture is against any scaling of VM and object root aproaches. You have to see it to believe it, hang on, fire up a browser and go to a 'thin' VM like Silverlight. You'll be schocked how slow and CPU hungry it is.
Two, kick of a database app for any performance, yes managed vs native db.
It's usually the algorithm not the language that determines the performance ballpark that you will end up in.
Within that ballpark, optimising compilers can usually produce better code than most assembly coders.
Premature optimisation is the root of all evil
This may be the "common knowledge" that everyone can parrot, but I submit that's probably because it's correct. I await concrete evidence to the contrary.
D can sometimes be faster than C++ in practical applications, largely because the presence of garbage collection helps avoid the overhead of RAII and reference counting when using smart pointers. For programs that allocate large amounts of small objects with non-trivial lifecycles, garbage collection can be faster than C++-style memory management. Also, D's builtin arrays allow the compiler to perform better optimizations in some cases than C++'s STL vector, which the compiler doesn't understand. Furthermore, D2 supports immutable data and pure function annotations, which recent versions of DMD2 optimize based on. Walter Bright, D's creator, wrote a JavaScript interpreter in both D and C++, and according to him, the D version is faster.
C# is much faster than C++ - in C# I can write an XML parser and data processor in a tenth the time it takes me to write it C++.
Oh, did you mean execution speed?
Even then, if you take the time from the first line of code written to the end of the first execution of the code, C# is still probably faster than C++.
This is a very interesting article about converting a C++ program to C# and the effort required to make the C++ faster than the C#.
So, if you take development speed into account, almost anything beats C++.
OK, to address tht OP's runtime only performance requirement: It's not the langauge, it's the implementation of the language that determines the runtime performance. I could write a C++ compiler that produces the slowest code imaginable, but it's still C++. It is also theoretically possible to write a compiler for Java that targets IA32 instructions rather than the Java VM byte codes, giving a runtime speed boost.
The performance of your code will depend on the fit between the strengths of the language and the requirements of the code. For example, a program that does lots of memory allocation / deallocation will perform badly in a naive C++ program (i.e. use the default memory allocator) since the C++ memory allocation strategy is too generalised, whereas C#'s GC based allocator can perform better (as the above link shows). String manipulation is slow in C++ but quick in languages like php, perl, etc.
It all depends on the compiler, take for example the Stalin Scheme compiler, it beats almost all languages in the Debian micro benchmark suite, but do they mention anything about compile times?
No, I suspect (I have not used Stalin before) compiling for benchmarks (iow all optimizations at maximum effort levels) takes a jolly long time for anything but the smallest pieces of code.
if the code is not written for performance then C# is faster than C++.
A necessary disclaimer: All benchmarks are evil.
Here's benchmarks that in favour of C++.
The above two links show that we can find cases where C++ is faster than C# and vice versa.
Performance of a compiled language is a useless concept: What's important is the quality of the compiler, ie what optimizations it is able to apply. For example, often - but not always - the Intel C++ compiler produces better performing code than g++. So how do you measure the performance of C++?
Where language semantics come in is how easy it is for the programmer to get the compiler to create optimal output. For example, it's often easier to parallelize Fortran code than C code, which is why Fortran is still heavily used for high-performance computation (eg climate simulations).
As the question and some of the answers mentioned assembler: the same is true here, it's just another compiled language and thus not inherently 'faster'. The difference between assembler and other languages is that the programmer - who ideally has absolute knowledge about the program - is responsible for all of the optimizations instead of delegating some of them to the 'dumb' compiler.
Eg function calls in assembler may use registers to pass arguments and don't need to create unnecessary stack frames, but a good compiler can do this as well (think inlining or fastcall). The downside of using assembler is that better performing algorithms are harder to implement (think linear search vs. binary seach, hashtable lookup, ...).
Doing much better than C++ is mostly going to be about making the compiler understand what the programmer means. An example of this might be an instance where a compiler of any language infers that a region of code is independent of its inputs and just computes the result value at compile time.
Another example of this is how C# produces some very high performance code simply because the compiler knows what particular incantations 'mean' and can cleverly use the implementation that produces the highest performance, where a transliteration of the same program into C++ results in needless alloc/delete cycles (hidden by templates) because the compiler is handling the general case instead of the particular case this piece of code is giving.
A final example might be in the Brook/Cuda adaptations of C designed for exotic hardware that isn't so exotic anymore. The language supports the exact primitives (kernel functions) that map to the non von-neuman hardware being compiled for.
Is that why you are using a managed browser? Because it is faster. Or managed OS because it is faster. Nah, hang on, it is the SQL database.. Wait, it must be the game you are playing. Stop, there must be a piece of numerical code Java adn Csharp frankly are useless with. BTW, you have to check what your VM is written it to slag the root language and say it is slow.
What a misconecption, but hey show me a fast managed app so we can all have a laugh. VS? OpenOffice?
Ahh... The good old question - which compiler makes faster code?
It only matters in code that actually spends much time at the bottom of the call stack, i.e. hot spots that don't contain function calls, such as matrix inversion, etc.
(Implied by 1) It only matters in code the compiler actually sees. If your program counter spends all its time in 3rd-party libraries you don't build, it doesn't matter.
In code where it does matter, it all comes down to which compiler makes better ASM, and that's largely a function of how smartly or stupidly the source code is written.
With all these variables, it's hard to distinguish between good compilers.
However, as was said, if you've got a lot of Fortran code to compile, don't re-write it.